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Preface 


Post-Editing has become an established task on the professional translation mar- 
ket. The aim of this textbook is to provide you with basic knowledge about post- 
editing so that you'll be able to make educated decisions when you want to work 
as a post-editor. It will consist of both theoretical introductions as well as applied 
knowledge. Similar to translating texts, post-editing requires a lot of exercise un- 
til you work proficiently and professionally. Therefore, this textbook can only 
be a starting point on your way to become a professional post-editor. 

This book is intended for professional translators or translators in training. 
So, we presume you have basic translation skills and knowledge. If you are not 
a translator, you can still use the textbook. However, we will not provide any 
reference material for general translation issues. The textbook consists of ten 
chapters and will aim at giving you an overview of both practical and theoretical 
aspects. 


Acknowledgments 


This book is based on different papers, discussions, and courses we have written 
and conducted in recent years. Particularly, we want to highlight the Digiling 
Online Course we developed during the Erasmus+ sponsored project phase dur- 
ing 2016 and 2019 (see Website of the project). The idea of the project was to 
originate training in Digital Linguistics. A needs analysis among market players 
showed a growing demand for employees with training in this field, but no Eu- 
ropean University offered a programme in Digital Linguistics. These were the 
goals:! 


« Find out about the market’s needs (survey with market players) . 


« Create an internationally approved model curriculum for Digital Linguis- 
tics (combining old and new courses). 


e Train the trainers in designing online courses. 


e Design online courses for selected modules (all were evaluated, localised 
and implemented into the online learning environment). 


e Disseminate and sustain the project outcomes. 


The project targets students, teachers, researchers, and other actors at universi- 
ties, companies, organisations, public institutions and other users of digital lan- 
guage services. You can find further exercises concerning post-editing or other 
exciting courses on digital linguistics, like localisation, term mining and manag- 
ing or programming in Python on our project online platform. 


‘Learn more on the project’s website. 


1 Let’s get started... 


Artificial intelligence is changing and will continue to change the world we live 
in. Many industries and jobs are also changing, with some jobs even vanishing. 
Since the industrial revolution, the human work force has been increasingly re- 
placed by machines. Some people are scared and fear for their jobs. Others are 
happy that mediocre jobs can finally be carried out by machines and technolo- 
gies, and that humans can concentrate on more meaningful work. 

These changes are also influencing the translation market. Machine transla- 
tion (MT) systems automatically transfer one language to another within seconds 
and are coming close to achieving a dream that humans have had for centuries: 
the ability to overcome language barriers. However, MT systems have existed 
for over 70 years now and are still not capable of producing perfect translations. 
So, how do these technologies influence the market? Are translators or language 
service providers (LSPs) on the verge of extinction? 

The general translation market is continuing to grow. And the demand is huge. 
Common Sense Advisory (CSA) research, founded in 2002, conducts what they 
consider “independent, objective” research on “the global content and language 
services markets” (csa-research.com!). Let us look at the following two state- 
ments. First, CSA research “found that the market for language services and sup- 
porting technologies will grow 6.62% from 2018 to 2019 [...]. The industry’s com- 
pound annual growth rate over the last 11 years was 7.76%” (csa-research.com). 
And the results were very similar in the years before. DePalma, who is the Chief 
Research Officer of CSA Research, comments on these developments: “People 
worldwide prefer consuming information in their own language. Meeting this 
expectation [...] fuels an indispensable global industry that continues growing 
due to global digital transformation (GDX)” (csa-research.com) Even during the 
COVID-19 pandemic, when steady reporting, market evaluations, and forecasts 
were not possible, CSA outlined that “preliminary revenue reports from LSPs [...] 
have produced better than expected returns for calendar year 2020” and predicted 


“Global market for outsourced translation and interpreting services and technology to reach 


US$49.60 billion in 2019”, last accessed 07/10/2020 
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“that the language services industry will grow faster than the overall economy 
in 2021” (csa-research.com?) in March 2021. 

So, the market itself is growing, but how do the MT systems influence these 
developments? There are different opinions on this topic. If we listen to the state- 
ment of a German politician, the prospects are rather bleak. Lars Klingbeil (SPD) 
stated in the political talk show Anne Will in November 2018, where they dis- 
cussed changing work environments due to AI in general, that 


Soon, whole industries will be gone [...] Industries that still exist at the mo- 
ment, that we still need now, but they will be gone in a few years, because of 
artificial intelligence, because of technical developments. So, the question 
is, how will we, as the government, take care of the people working in these 
industries? Let’s discuss translators, interpreters for example [...] in a cou- 
ple of years, we will not need their services anymore, because technological 
developments will render them useless.’ 


After the talk show was aired, many professional translators were outraged 
and many voices were raised. The BDU* - a German professional association for 
interpreters and translators - responded to Klingbeil’s statement: 


Digitalisation and developments in the area of artificial intelligence (AI) are 
changing the working environment [...] However, these developments have 
always influenced our industry in particular, and we have always known 
not only how to adapt to the circumstances but also how to use them to 
our advantage by harnessing the technology instead of fighting it. [...] Ad- 
ditionally, the translation industry is growing due to globalisation and dig- 
italisation.” 


*Sizing the language industry: March 2021 update, last accessed 25/05/2021 

3translated by authors. Original quote: „Es werden bald ganze Branchen verschwinden. [...] die 
noch da sind, die gebraucht werden, aber die in den nächsten Jahren verschwinden werden, 
durch künstliche Intelligenz, durch technologische Entwicklung. Und da ist die Frage, wie stellt 
der Staat sich eigentlich gegenüber den Menschen auf, die da arbeiten. Ich nehme mal nur das 
Beispiel der Übersetzer, der Dolmetscher [...] die wird es in ein paar Jahren als Dienstleister 
nicht mehr geben, weil technologische Entwicklung das überflüssig macht“ 

“https://bdue.de/der-bdue, last accessed 15/12/2020 

‘translated by authors. Original quote: „Mit der Digitalisierung und den Fortschritten im Bere- 
ich Künstlicher Intelligenz (KI) verändern sich die Arbeitsbedingungen [...] Derartige Entwick- 
lungen haben aber gerade diesen Berufsstand schon von jeher begleitet und er hat es immer 
wieder verstanden, sich den neuen Bedingungen nicht nur anzupassen, sondern diese sinnvoll 
zu nutzen. Und zwar unter Zuhilfenahme der technischen Werkzeuge und nicht im Wettlauf 
gegen sie. [...] Im Zuge der Globalisierung und Digitalisierung wächst zudem seit Jahren der 
Bedarf an Übersetzungen“ 


As mentioned above, the influence of technology and AI is not only noticeable 
in the translation industry, but in the majority of industries. The economist Autor 
(2014: 8-9) argues that many everyday tasks cannot be automated, because “we 
don’t know ’the rules” - something that is challenged by machine learning - and 
he continues that 


[t]he fact that a task cannot be computerised does not imply that computer- 
isation has no effect on that task. On the contrary: tasks that cannot be sub- 
stituted by computerisation are generally complemented by it. This point 
is as fundamental as it is overlooked. Most work processes draw upon a 
multifaceted set of inputs: labor and capital; brains and brawn; creativity 
and rote repetition; technical mastery and intuitive judgment; perspiration 
and inspiration; adherence to rules and judicious application of discretion. 
(emphasis in original quote) 


Bowker (2020b: 267-268) paints a much more pessimistic picture in her article 
about translation technologies and ethics: 


Several authors [...] highlight a major risk associated with using CAT® tools: 
the concealing, overshadowing or downgrading of the translator’s contribu- 
tion. Rather than seeing a translator who interprets a source text’s meaning 
and intention and renders these in an appropriate target text, clients may 
perceive the language professional as a copy editor who simply makes mi- 
nor revisions to the “real” work that has been largely done by a machine, 
which has retrieved the correct solutions from its database or corpus. 


On the other hand, there are also more optimistic voices. DePalma (2017) and 
Lommel (2020) describe a model for augmented translation in the CSA Research 
blog. In this approach, they argue that translators are at the centre of the transla- 
tion process, surrounded by different technologies that support their work. The 
augmented translators are provided 


with more context and guidance for their projects. They work in a technol- 
ogy-rich environment that automatically processes many of the low-value 
tasks that consume an inordinate amount of their time and energy. It brings 
relevant information to their attention when needed. This computing power 
will help language professionals be more consistent, more responsive, and 
more productive, all the while allowing them to focus on the interesting 
parts of their jobs rather than on “translating like machines.” (DePalma 2017) 


®short for “computer-assisted translation” 
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So where do we stand at the moment? Will one of the oldest professions be- 
come extinct? Will the work environment of translators ’simply’ change? Or has 
the profession taken a step forward and started to release the professionals from 
redundant and boring work? As we already mentioned at the very beginning, 
MT output is still not perfect, neither linguistically nor in terms of content. This 
is true for all language combinations, although different quality in the output 
can be observed for different language pairs. For some language directions and 
domains, the quality might even be exceptionally good.’ The quality depends 
on various factors, among them text type, amount and quality of training data, 
but also similarity of the languages. To achieve high quality translations, the MT 
output has to be post-edited, which will be the topic of this textbook. 


Post-editing (PE) has become a well-established task for professional transla- 
tors. The raw machine-translated output can help the post-editor to accelerate 
the translation process and to make the translation process more profitable and 
less expensive for the client. However, the professional post-editor needs basic 
knowledge of MT and post-editing to assess PE tasks and to make informed deci- 
sions. This textbook will give you an introduction to the most relevant topics in 
professional PE. We assume, of course, that you as a user of this textbook already 
have translation experience either as a professional translator or as a translation 
student. Similar to the professionalisation in translation-from-scratch, we can 
only provide you with a starting point for PE. To make your assignment truly 
effective and profitable, you will need to practise the task with real PE jobs. 

We will provide you with examples and practical scenarios that you can apply 
to your specific PE tasks and that will guide you through your first steps as a 
professional post-editor. 

The textbook will be structured as follows. First, we will give you a brief gen- 
eral introduction to post-editing in §2 and introduce machine translation basics 
in §3. In §4, we will discuss different guidelines for the PE task. Then, we will 
talk about PE in general and crucial concepts like different text types in MT (§5), 
how PE can be integrated into CAT tools (§6), risks in PE and data security (§7), 
practical decisions for PE jobs (§8), required competences for PE, new job profiles 
and training opportunities (§9). And finally, we will wrap things up in §10. 

At the end of §2 to §9 you will find a little crossword puzzle on the contents 
of the preceding chapter so you can review some of the buzzwords and main 
concepts (or every now and again some details) of the chapter. Feel free to go 


7See also the discussion concerning human parity, e.g. Laubli et al. (2020). 
šWe used the package cwpuzzle to create the crossword puzzles in LATEX. (Neugebauer 2020) 


back if you can’t remember the answer. Maybe you want to reflect a little on the 
concepts behind the answers. 

Finally, we want to point out that you can find many PE exercises on the web- 
site of the Digiling project that will complement the contents of this book. Please 
do not hesitate to create a free account and use the materials to get more insights 
into practical PE and to strengthen your knowledge. 


Enjoy! 


2 Post-editing — what is it? 


Learning objectives 
Let us first concentrate on some initial concepts. 


You will learn... 
e what PE is, 
e who should perform PE tasks, 
+ where and when PE is needed, 


e about the meaning of MT in professional contexts. 


Before you learn about MT, its development and its different approaches, we 
want to start with a few thoughts and notions on PE. First, lean back and consider 
the following questions. You might want to take some notes so that you can take 
a look at your answers at the end of the discussion and reflect on them. 


Have you talked about PE with colleagues? What was their opinion? Did 
their opinion influence your thoughts on PE? 


How do you think post-editing is different from translation-from-scratch? 
And how is it different compared to revising translations by colleagues? 
Do you need similar or different competences for the tasks? 


What are your greatest fears when thinking about integrating MT and PE 
into your professional workflow? What are potential risks? 


And how much do you actually know about the functionality of MT and 
the PE process? 


2 Post-editing — what is it? 


These questions — amongst many others — will be discussed in the following 
chapters. But let’s get started at the very beginning. First, we need to define some 
terms to ensure that we are on the same page. Post-editing (PE) “is the correction 
of raw machine-translated output by a human translator according to specific 
guidelines and quality criteria“ (O’Brien 2011: 197-198). This definition points out 
two very important characteristics of post-editing, which we will discuss a little 
in the following. 

O’Brien (2011) specifically states that PE should be performed by a human 
translator and not a layperson who is capable in the source and target language 
or - even worse - only the target language.! Hence, we can assume that PE has 
common features with translation - one of the topics we will discuss in the chap- 
ter on PE competence (see §9). Further, she points out that specific guidelines and 
quality criteria are important in post-editing similar to the translation brief for 
human translation jobs. They determine how much post-editing effort is neces- 
sary for the respective post-editing job. In a later section, we will discuss different 
approaches to post-editing and what is essential to each approach. These include 
the following dichotomies, which we will discuss in the given sections: 


e light and full post-editing (see §4) 
e monolingual vs. bilingual post-editing (see 84) 


e post-editing vs. interactive MT editing (see §6) 
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As in many other disciplines, buzzwords such as “digitalisation”, “artifical in- 
telligence” or “industry 4.0” have also become relevant in translation studies and 
practice in recent years. In fact, we have been moving towards the automatisa- 
tion of translation for decades now. We can imagine the automatisation process 
of translation as a scale. At one end of the scale (in Figure 2.1 on the right), we 
have human translation. The very end of the scale implies that no electronic aids 
would be involved. The translators would translate using pen and paper, printed 
dictionaries and the source text document would also be available as a printed 
version. At the other end of the scale, we find fully-automatic, high-quality ma- 
chine translation (FAHQMT). So, the human translator would have no involve- 
ment in the translation process at all, except for providing the source text to the 
machine and receiving the target text. We might argue that both extremes are 
similarly unlikely. Maybe there are some scenarios where humans and machines 


There are also methods for automatic post-editing, which we, however, will not discuss in 
further detail. Read do Carmo et al. (2020) for an overview and evaluation of automatic PE 
methods. 


do not interact in the translation process at all, but those are rather unlikely.” At 
the moment, we are somewhere in between both extremes - and, as mentioned 
earlier, we have been there for a while now. Since the 1990s, professional trans- 
lators have been using CAT tools, a term which typically refers to translation 
memory systems, terminology management systems and project management 
systems (for a quick introduction, see e.g. Folaron 2010). However, the use of 
word processing programs or electronic/online dictionaries also counts as a step 
towards automatisation. When those tools are used, we speak of machine-aided 
human translation (MAHT). The human is still at the centre of the translation 
process, but is supported by the machine. One further step towards automati- 
sation is what is called human-aided machine translation (HAMT). Here, MT 
systems are involved and the human “solely” has to prepare the source text for 
the machine (pre-editing) and/or improve the MT output (post-editing). The lat- 
ter is what we will focus on in this book (marked in a dotted line in Figure 2.1). 
What you have to keep in mind, though, is that MT output is still only a tool 
for the professional translator (if you do not agree now, you will probably agree 
after you have finished the book). The focus in pre- and post-editing is more 


“Of course, this again refers to professional translation practice. Some students still have to 
write their exams at universities with pen and paper - the pros and cons of this procedure, 
however, will not be discussed here. Similarly, there have been scenarios for FAHQT for very 
restricted text types for decades. See e.g. the Météo system from Canada in §3.1. 


Automate Human Translation 
—as s Ñ—— "c [W T 
Z AC RN SL 
Fully Automatic H Human-Aided Machine-Aided Traditional 
High-Quality 1 Machine Human! Translation 
Translation i Translation Translatiqn 
(FAHQT) I (HAMT) (MAHTI 


TEESE 
Computer-Assisted Translatiqn 


Huchins and Somers (1992: 148) 


I 
| (CAT) | 
no human i Pre-Editing Translation- no electronic 
interaction | Post-Editing Memory-Systems; help involved 
involved i Terminology (extreme) 
(extreme) \ Management- 
an PS | 


Figure 2.1: Dimensions of human and machine translation adapted 
from Hutchins & Somers (1992: 148) 
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on the machine and a certain amount of the work is performed by the machine. 
However, the professional translator is still responsible for transforming the MT 
output into a translation that matches the quality criteria defined for the target 
text. 

As Allen 2003 already pointed out, PE introduced a new perspective to trans- 
lation studies, because translators never really had to deal with “half-finished” 
texts before. In PE, the target text does not need to be produced from scratch. 
The translators already have an outline for the final product. Hence, PE and hu- 
man translation can be considered different tasks. Further, machine-translated 
texts have different characteristics than human translations. Therefore, PE can- 
not be seen as another form of proof-reading either. While some mistakes, like 
spelling and typing errors, hardly ever occur in MT output, other mistakes, e.g. 
syntax or lexicon errors, would almost never occur in human translation. Accord- 
ingly, there are many interesting questions that must be answered to define the 
nature of PE. Here, we have only listed a few: 


e How much MT is acceptable? 

« How much PE effort is necessary? 

« How much time (and money) can be saved? 

e What is the quality of the post-edited target text? 

e What is the difference from human translation/proof-reading? 


We will discuss all these questions in the following sections, providing you with 
a grounded understanding by the time you finish this book. 


From a research point of view, post-editing “is a field in which the human 
translator and the machine meet - as well as the two disciplines Machine Trans- 
lation and Translation Science” (Culo et al. 2014: 35). Hence, it is very interesting 
for interdisciplinary research as well. Since this book is intended to be a rather 
practical guidebook, we do not want to go into detail concerning scientific re- 
search on post-editing. We just want to say a few words on the importance of 
research for this subject matter and where you can find relevant literature and 
databases with empirical PE data. 

First of all, we want to clarify that there are some initial approaches to basic 
theoretical research concerning PE. One promising theory to describe PE phe- 
nomena seems to be the relevance-theoretical approach as cognitive and prag- 
matic aspects are connected. Post-editors are trained professionals who are able 
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to bridge a communicative gap between languages by editing machine transla- 
tion output situated in the target context. This task is based on well-founded 
decisions regarding the source text, the intended recipients, the target culture 
and the post-editing brief. On a cognitive level, relevance theory assumes that 
MT output should be edited with the least possible effort envisaging efficient and 
successful communication. Alves et al. (2016) discussed relevance-theoretical as- 
pects for PE. The temporal and cognitive effort is ideally reduced during post- 
editing as translation options are suggested by the MT system, which the post- 
editor “solely” has to accept, reject or modify. Typically, guidelines are provided 
for each PE project to support the post-editors’ decision-making process. A rule 
that often occurs in PE guidelines is that as much of the raw MT output as pos- 
sible should be retained to make the process fast and efficient. This also has the 
advantage for the client that the costs for the translation process decrease. On 
the other side, however, this also means that the recipients are expected to invest 
more cognitive effort when reading the target texts as the texts are not linguis- 
tically and/or stylistically perfect. Carl & Schaeffer (2019) combined relevance 
theory and the noisy channel model to approach PE theoretically. They propose 
a “model in which [relevance theory] complements the ‘Noisy Translator Chan- 
nel’ by adding constraints of causal interrelation between stimulus, context and 
interpretation, established by the principle of relevance.’ (Carl & Schaeffer 2019: 
60) 

In addition to these theoretical considerations, there is a whole range of empir- 
ical studies comparing post-editing to translation-from-scratch, addressing the 
following research questions (of course, this is only a selection): 


« How efficient is PE compared to human translation? 


Can cognitive effort while post-editing MT be measured? 


How good is the quality of post-edited texts? 


Can MT errors be predicted and PE effort be estimated? 


Can PE effort be correlated to MT quality? 


Are some language pairs more suitable for MT and PE than others? 


Which text types, genres and modalities are particularly suitable for post- 
editing? 


Is there a difference in PE performance when comparing students and pro- 
fessionals? 


11 


2 Post-editing — what is it? 


From a methodological point of view, most studies rely on a multi-method ap- 
proach combining eyetracking with keylogging data. In addition, questionnaires 
describe the metadata with respect to the participants, such as personal data as 
well as translation and language competence and experience. There is a widely 
established research database that includes PE and translation data for several 
language pairs and levels of expertise: the CRITT Translation Process Research 
Database (CRITT TPR-DB, Carl et al. 2016). This database enables the triangula- 
tion of different kinds of data, which in turn sheds light on sequential and par- 
allel cognitive processing activities, reading and writing processes, post-editing 
and research strategies. If you would like to learn more about the database, the 
studies and the resulting publications, please consult the following website’. 


3last accessed 12 April 2021 
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Crossword puzzle — chapter 2 


5 


Across 1 Who should be the post-editor? A professional ... 2 Specific guidelines and quality 
criteria define how much... a post-editor has to put into the post-editing job. 6 What kind of studies 
exist about post-editing aside from theoretical considerations? 7 What kind of methodology is 
often used in cognitive translation studies in addition to keylogging? 8 What is the name of the 
Translation Process Research Database? 

Down 3 What is the term that refers to translation memory systems, terminology management 
systems and project management systems? 4 What do we call the preparation of the source text 
for machine translation? 5 According to which theory would we assume that MT output should be 


edited with the least possible effort to achieve the most efficient and successful communication? 
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3 MT history — how has machine 
translation developed? 


Learning objectives 


You will learn... 


« how MT developed in the last decades, 
e how the different MT systems generate their translation, 


e how suitable the different MT architectures are with respect to post- 
editing. 


You might get the impression that MT is a rather new invention because it 
became increasingly visible with the rise of statistical and especially neural MT 
engines. However, the first ideas about automating language and translation date 
back centuries. In order to understand which approaches are suitable for post- 
editing and under which conditions, it is important to learn something about the 
commonalities and differences of the MT architectures and how they developed 
over time. 

This section will shed light on the historical development of MT in §3.1 and 
will introduce the basic MT approaches and discuss their suitability with respect 
to post-editing in §3.2. 


3.1 Historical development of MT 


The following table (3.1) will show you the most important historical milestones 
starting from the 1930s with the first events that initiated the actual development 
of the first systems. The notion of automating translation processes is much older, 
however it was only theoretical then. 


3 MT history - how has machine translation developed? 


If you want to learn more about the historical development of MT, we recom- 
mend reading, for example, Hutchins (2000), Hutchins (2007), and/or Schwartz 
(2018). 


Table 3.1: History of MT. 


History of MT 
Event Description 
1933 - The French-Armenian Georges Artsrouni and the Rus- 
First patents for sian Petr Troyanskii independently proposed patents 
“Translating for “translating machines” as early as 1933. Troyanskii 
Machines” proposed not only a method for an automatic bilingual 


dictionary on paper tape, but also a scheme for coding 
interlingual grammatical roles (based on Esperanto) 
and an outline of how analysis and synthesis might 
work. However, Troyanskii’s ideas were unknown to 
the community until the end of the 1950s. 


1949 - A memorandum titled “Translation” by Warren 

Weaver Weaver in July 1949 introduced the idea of MT to the 

Memorandum general public. It is often considered the starting point 
of research on MT. 

1952 - The first full-time researcher in MT was appointed 

First MT at the Massachusetts Institute of Technology (MIT) 

conference in 1951, namely Yehoshua Bar-Hillel. He believed 


that fully-automatic, high-quality translation (FAHQT) 
could be possible. In 1952, the first conference on MT 
was held at MIT and covered topics such as pre-editing 
and post-editing, controlled language, domain restric- 
tions, syntactic analysis as well as computer hardware, 
programming and funding. 
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3.1 Historical development of MT 


History of MT 
Event Description 
1954 - During the Cold War, the first official MT projects were 
Georgetown- launched for the US military and also in the Soviet 


IBM Experiment Union. Those projects focused on MT between Rus- 
sian and English. They produced rather poor quality 
but were still popular for political and military rea- 
sons. Thus, most US research was for Russian-English 
translation, and most Soviet research was on English- 
Russian systems. The focus was on purely informative 
translation purposes without PE or any involvement of 
third-party translators or interpreters. 

At Georgetown University, Leon Dostert collaborated 
with IBM on a project known as the Georgetown-IBM 
experiment which resulted in the first public demon- 
stration of an MT system in January 1954. The project 
involved the fully automatic translation of more than 
sixty sentences from Russian into English. The exper- 
iment, which was conducted with well-chosen sen- 
tences, launched the first MT hype. This stimulated 
large-scale funding of MT research in the USA and in- 
spired the initiation of MT projects elsewhere in the 
world, notably in the USSR. 


1961 - The Linguistic Research Center (LRC) at the University 
Machine of Texas concentrated on basic syntactic research of 
Translation English and German. Efforts were made to devise re- 


Research in Texas || versible grammars to achieve bidirectional translation 
within an essentially “syntactic transfer” approach. 
This laid the foundations for the later successful devel- 
opment of the METAL system which started in 1979 in 
cooperation with Siemens AG. 
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MT history - how has machine translation developed? 
History of MT 

Event Description 

1966 - In November 1966, the Automatic Language Processing 

ALPAC Report Advisory Committee (ALPAC) issued a report to the 

announces Pentagon which brought the substantial funding of MT 

FAHOT research in the United States to an end for about twenty 

impossible years. The clear message of the ALPAC Report to the 
general public and the rest of the scientific community 
was that there is basically no hope for MT. The title of 
the report was “Languages and machines: computers 
in translation and linguistics”. Supposedly, it dealt not 
just with MT but also with computational linguistics. 
However, in practice, most funded NLP research was 
devoted to developing MT at the time. 

1968 - Engineering companies continued less visible research 

The Beginnings on MT. IBM developed the first commercial statisti- 

of Systran cal machine translation system (SMT) Systran in 1968. 
Systran is one of the oldest MT companies and was 
founded by Peter Tome. 

1970-1978 - In 1970, research began on a syntactic transfer system 

Météo for English-French translation at Montreal. The TAUM 
project (Traduction Automatique de l'Université de 
Montréal) had two major achievements: the Q-system 
formalism, a computational metalanguage for manip- 
ulating linguistic strings and trees and the foundation 
of the Prolog programming language widely used in 
natural language processing; and secondly, the Météo 
system (1978) for translating English weather forecasts 
into French. 

1972-1986 - Researchers at Saarbriicken University developed 

SUSY SUSY (Saarbrücker Ubersetzungssystem), a highly- 
modular multilingual transfer MT system. 

1976 - The European Commission first introduces Systran for 


EC and Systran 


in-house purposes. 


3.1 Historical development of MT 


History of MT 
Event Description 
1978-1992 - The EUROTRA project was founded by the European 
The EUROTRA Commission with the aim of developing a state-of-the- 
project art MT system for the then seven, later nine, official 


languages of the European Community. In 1978, re- 
search on different MT projects began, e.g. research 
leading to the ARIANE system at Grenoble University 
(Russian-French-English-German), the LOGOS system 
(USA; now OpenLogos for German-English), and the 
GRADE system within the Mu project at Kyoto Uni- 
versity (Japanese-English). 


1985 - In 1985, the Japanese government started the 5th Gen- 
Prolog eration Project and developed the Prolog programming 
language for commercial MT which led to the develop- 
ment of LMT (Logic programming MT). 


1989 - Already in the 1970s, Siemens started working on a 
METAL System transfer approach for a machine translation system to- 
gether with the LRC (Linguistic Research Centre) in 
Texas. Originally titled the Linguistics Research Sys- 
tem (LRS), it was later renamed METAL (Mechanical 
Translation and Analysis of Languages) and finally be- 
came commercially available in 1989. 


1992 - The development of MT evaluation metrics started si- 
Automatic multaneously with the development of statistical MT 
Evaluation of MT || systems. Initial evaluations were carried out by human 
assessors according to FEMTI (Framework for the Eval- 
uation of Machine Translation in ISLE). In 1992/1994, 
DARPA - a research agency of the United States mil- 
itary — investigated unedited output of MT systems, 
comparing automatic measures and human judgments 
of adequacy, fluency, informativeness. 


1993 - The first commercial translation memory system, 
Translation launched in the 1990s, was named TRADOS (TRAns- 
Memory Systems || lation and DOcumentation Software) and used aligned 
bilingual corpora. Trados was established in Stuttgart, 
Germany by Jochen Hummel and Iko Knyphausen in 
1984. 
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3 MT history - how has machine translation developed? 
History of MT 
Event Description 
1997 - After Systran started offering its service of online ma- 


First free online 
MT 


chine translations of entire webpages, the first free on- 
line machine translation was launched in December 
1997 with Babel Fish on Alta Vista (later Yahoo) which 
was free for all Internet users but was discontinued in 
2008. 


2000-2001 - 
Crash of dotcom 
companies 


After the crash of the dotcom companies in 2000, the 
number of MT software companies decreased to 10-20 
active companies and universities that were working 
on MT. They made up only 1% of revenue on the trans- 
lation market despite the growing demand for MT ap- 
plications. The still growing number of digitally avail- 
able texts due to globalisation requires more transla- 
tions with all sorts of language combinations even to- 


day. 


2001-2005 - 
BLEU 


In 2001, the BLEU score (Bilingual Evaluation Under- 
study) was developed which consists of statistical mea- 
sures of similarity of SMT output and human (refer- 
ence) translations. Other automatic evaluation scores 
include the NIST metrics by the National Institute of 
Standards and Technology and METEOR (Carnegie 
Mellon) in 2005. 


2002 - 
Language Weaver 


Language Weaver is the first company to commer- 
cialise a statistical approach to automatic language 
translation and natural language processing, which 
was founded by Kevin Knight and Daniel Marcu in 
2002. 


2003-2004 - 
Expansion of EU 
and growing 
demands 
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With the expansion of the EU, the number of official 
languages increased from 11 to 20 languages. At peak 
times, the European Union produced more than 1.4 mil- 
lion pages of text per month. Since then, about 380 lan- 
guage combinations are possible for translation at the 
EU, which can no longer only be covered by human 
translators. 


3.1 Historical development of MT 


History of MT 

Event Description 

Since 2004 - With the growing demand for language pairs numer- 

Open Source ous open source toolkits for MT systems appeared. Ex- 

toolkits for MT amples are: GIZA ++ (alignment tool for SMT); Moses 
(a platform for building SMT systems); Joshua (a de- 
coder for syntax-based, hierarchical SMT); Apertium 
(a platform for building rule-based MT); META-SHARE 
(database for EU projects) 

2006 - On 28 April 2006, Google Translate went online with 

Google Translate || a statistical MT system. The first system used tran- 

SMT and scripts from the UN and the European Parliament to 

Euromatrix gather enough training material. Further, the system 
used English as a transfer language. Later, Google ex- 
panded to further language pairs and applied a hybrid 
method. Around the same time, the Euromatrix project 
was founded with the aim of providing MT systems for 
all EU languages (over 500 language pairs), which in- 
volved a different research partner. 

2010 - In July 2010, the leading language services company 

SDL Acquires SDL acquired Language Weaver/ BeGlobal Statistical 


Language Weaver 


Machine Translation and the MT system became SDL 
Language Weaver. 


2016 - In 2016, Google Translate was one of the first to intro- 

Google duce neural machine translation (NMT). In November 

introduces NMT 2016, software engineer Harold Gilchrist, leader of the 
Google research team developing the Google Neural 
Machine Translation system (GNMT) to increase flu- 
ency and accuracy in Google Translate, announced the 
switch from SMT to GNMT. 

2016 - SDL introduces Adaptive MT for SDL Trados Studio 

SDL Adaptive 2017 - a self-learning machine translation engine. 

MT 

2017 - DeepL is another NMT system that was introduced by 

DeepL is the online dictionary and corpora provider Linguee. 

introduced 


Future 
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3 MT history - how has machine translation developed? 


3.2 MT architectures 


As you have seen above, different approaches have been developed to automatise 
the translation process. Here, we will discuss the pros and cons for rule-based, 
statistical and neural MT and their usability for PE workflows. 


3.2.1 Rule-based machine translation (RBMT) 


Rule-based approaches were the catalyst for the development of MT. Generally, 
these systems attempt to define the individual characteristics of the source lan- 
guage and how these need to be converted into the target languages. Chesterman 
(1997: 31) mentions that he sees this early form of MT as “the Linguistic meme 
of translation theory”, because it assumes that languages can solely be expressed 
through rules, which, accordingly, must also be representable in algorithms. Dif- 
ferent rule-based approaches had been developed over the years to generate MT: 


e Direct MT: This type of MT is constructed specifically for one language 
pair and usually one translation direction. Essentially, the words of the 
source text are morphologically analysed and then looked up in a dictio- 
nary, which means that ideally all morphology rules are defined, so that 
the dictionary only has to contain the stems of the words. In the next steps, 
the words of the source language are replaced by the words in the target 
language and all morphological changes required by the target language 
are applied. 


Transfer-based MT: The transfer-based approach constructs a syntactic 
representation of the source text (often in a tree structure) that is free of 
ambiguities, etc. Next, this representation is generated for the target lan- 
guage with the help of a grammar that contains the bilingual transfer rules. 
The target text can be produced now. It is possible to use these systems in 
both language directions, but this is rarely done in practice, because the 
transfer rules often cannot be applied in both directions. 


Interlingua-based MT: For this approach, a so-called Interlingua needs to 
be created. This Interlingua represents meaning in an abstract form, which 
can theoretically be achieved by either a natural or an artificial language 
or a language-independent representation. The basic principle of this ap- 
proach is that the source text is translated into the Interlingua and then 
the Interlingua into the target language. This approach is suitable for mul- 
tilingual systems. 
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3.2 MT architectures 


Please note that this is only a brief introduction to the main concepts of these 
approaches. For overviews on rule-based approaches see Hutchins & Somers 
(1992) or Wilks (2008). You can find a visualisation of the different rule-based 
approaches in Figure 3.1. This pyramid was introduced by Vauquois (1968) and 
shows the different steps that lie between source text and target text generation. 


Interlingua 


direct 
source target 


text text 


Figure 3.1: Vauquois pyramid (Vauquois 1968) 


Concerning PE, this approach seems especially suitable for the translation of 
texts adhering to a controlled language. Controlled languages are defined by a set 
of rules, which can theoretically be directly implemented into rule-based systems. 
The main disadvantage of these approaches is, however, that it takes a lot of effort 
to develop the systems, because the better and more comprehensive the intended 
system, the more rules have to be defined. If morphology, grammar, and syntax 
are only defined superficially, the source text might be generated incorrectly, 
which leads to (severe) mistakes in the target language. Further, the rules have 
to be defined from scratch for every language involved or sometimes even for 
every language direction, depending on the MT approach. 

Today, rule-based approaches are outdated and can usually only be found in 
hybrid systems or in very old, established systems. For example, the Pan Amer- 
ican Health Organization (PAHO) still uses a rule-based system, because they 
already established their MT system in the 1980s and traditionally work with 
only three languages.! 


‘Find out more on the PAHO’s webpage on their MT system PAHOMTS, last accessed 27 April 
2021. 


23 


3 MT history - how has machine translation developed? 


More promising are data-based approaches, which render the human program- 
ming of linguistic rules for MT unnecessary. Data-based MT relies on mono- 
and multilingual corpus data. The following sections will give an overview of 
the most prominent data-based MT architectures: statistical machine translation 
(SMT) and neural machine translation (NMT). 


3.2.2 Statistical machine translation (SMT) 


SMT had been the state of the art for decades. The basic idea of this approach 
is to generate a translation from a parallel training corpus by calculating the 
most likely equivalent of a source word/phrase/sentence in the target language. 
Statistical translation models are generated and trained on corpus data. Both 
mono- and bilingual corpora are used to capture the typical linguistic structures 
of the languages involved - the monolingual corpora generate the target lan- 
guage model and the bilingual parallel corpora generate the translation model. 
In addition, SMT uses so-called n-grams — sequences of aligned words (usually 
n<7) assigned with probabilities, which represent how likely the word sequences 
occur in the training corpus. Further, additional information can be extracted 
during the training phase, e.g. models of relative sentence length. The training 
of SMT systems can be realised relatively quickly if aligned parallel corpora are 
available. Training means in this context that the source text is analysed. Decod- 
ing, on the other hand, means in this context that the target text is generated. In 
between, there is the tuning phase, where the system tries to find the best values 
for the respective sentences or n-grams. Two models are commonly differenti- 
ated to calculate the most likely translation: the noisy channel model and the 
log-linear model. We do not want to go into detail here, please refer to Hearne 
& Way (2011) for further information. 

In addition to the translation and language models, other features that contain 
linguistic information can be implemented to help calculate the most likely trans- 
lation. These include a phrase table for each language direction, lexical transla- 
tion probabilities, a model for phrase reordering, or a word or phrase penalty 
that controls the length of the target sentence. All these features help to calcu- 
late the most likely translation and select it among other translation candidates. 
Each feature obtains a value that represents its weight in the algorithm. 

A final training task involves the evaluation of the system’s performance and 
the adaptation of the values that are given to the single features. One widely 
adopted technique to estimate the system is the so called MERT technique, which 
is short for Minimum Error Rate Training (see also Och 2003). The MERT tech- 
nique usually includes the BLEU metric. To keep it simple, this is an automati- 
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3.2 MT architectures 


cally calculated value that evaluates the quality of an MT system by comparing 
the translation output of the system to a reference translation. 

An advantage of post-editing SMT texts is that the errors to be corrected are 
quite predictable. SMT systems always produce the same errors as long as they 
are not trained with new or extended training corpora. The code of the system is 
transparent and the calculation of translation probabilities straightforward. Fora 
given language direction, typical errors can be identified, i.e. that the post-editor 
can systematically train error spotting and correction (Culo et al. 2014, Nitzke 
2019). 

Recent developments involving SMT have attempted to unite different ap- 
proaches - usually rule-based and statistical - in hybrid systems so that the ad- 
vantages of each approach can be combined. Systems with deep integration con- 
struct a whole new system that combines the advantages of the two approaches. 
Shallow integration systems, on the other hand, unite two or more existing sys- 
tems into one new system. 


3.2.3 Neural machine translation (NMT) 


The latest approach to MT is the use of neural networks and can also be applied to 
parallel training corpora. NMT systems build large neural networks for transla- 
tion, while statistical MT systems are composed of many small subcomponents. 
NMT systems use deep-learning approaches and learn automatically from the 
training data. 

At least three basic layers are involved in neural machine translation: the in- 
put layer, the output layer and at least one hidden layer in between. In the input 
layer, the source text is processed and in the output layer, the target text is cre- 
ated. The hidden layers are the processing steps. The model can work in a more 
fine-grained way and more complex tasks can be tackled when more hidden lay- 
ers are included in a system. During training, mathematical representations and 
weights are assigned to the source and target segments in the training data. The 
training is comparably time-consuming as the system runs through the train- 
ing data several times to adjust this structure. Further, no specific rules can be 
added manually, as the system develops the structure of the hidden layers auto- 
matically. Hence, only input and output layers are known, but the rest is more 
or less a blackbox (see Koehn 2017 for detailed information), although tools and 
methods have been developed to interpret the decisions within a system (e.g. Vig 
2019) 

Two approaches are common in neural machine translation: transformer and 
recurrent encoder-decoder models. In the encoding phase, the meaning of the 


25 


3 MT history - how has machine translation developed? 


source text is encoded into a vector with a fixed length. Transformer and recur- 
rent systems differ in the way they encode the source text. In the decoder phase, 
the target segment is produced word for word. During the production, the system 
considers the surrounding words as context. One disadvantage of this system is 
that it has difficulties with long sentences. To overcome these problems, so called 
alignment models are implemented. These are often also called attention models. 
To improve the translation output, an additional layer is put between the input 
and the hidden layer. This additional layer embeds words, which means that the 
system is capable of considering context. This means that all content words are 
assigned to a representation. Words that are close content-wise are represented 
closely. Hence, similar words are clustered and are translated in similar manners. 
If you want to learn more on the architecture of NMT systems, read the compre- 
hensible introduction by Pérez-Ortiz et al. (forthcoming). 

As the current NMT systems for interlingual translation currently outperform 
all other systems in many cases (Bentivogli et al. 2018; Toral & Sanchez-Cartagena 
2017), not only redundant and highly standardised text types are in the focus of 
attention (as is the case in technical documentation, software localisation etc.), 
but also creative text types such as literary translation (Toral et al. 2018) or sub- 
titling (Tardel 2020). It might also be very interesting to test NMT for intralin- 
gual translation by translating Standard Language into Easy Language (as can 
be found in approaches to automatic text simplification, see Specia 2010). For 
example, it seems plausible that the neural networks might be able to represent 
the rules for translating standard German into Easy German during the train- 
ing phase. For this purpose, a parallel corpus of standard German source texts 
and Easy German target texts will be necessary (e.g. Klaper et al. 2013). An NMT 
system which differentiates between different scales of complexity and difficulty 
(Easy Language — Plain Language - Standard Language - Specialised Language) 
would revolutionise the whole area of accessible communication. 

With respect to PE, one of the advantages of NMT is that the machine-trans- 
lated output seems to be (much) better compared to other system architectures, 
at least when it comes to fluency. However, we can only get good results in the 
MT output if we have enough training material to feed the system. If we do not 
have enough training material, we get poor quality. This is often problematic 
for smaller languages and rare language combinations as they are often under- 
represented and poor in resources. Further, as with all data-driven MT systems, 
the output can only be as good as the training data. Hence, if we train a system 
with poor quality translations, we get poor output. The same applies for domain 
specific translations. If the system is not well-trained on the specific domain, the 
output is poor as well. In total, the system is much more vulnerable to noisy data. 
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3.2 MT architectures 


Newest developments, however, allow the combination of NMT and the train- 
ing of specific terminology, which tackles the domain problem (e.g. Michon et al. 
2020). Another advantage is that NMT systems contain one compact system that 
does not have several components. However, training takes much longer than it 
takes for SMT and it also needs more computer processing capacities. 

Finally, we want to point out that the generally better NMT output quality 
leaves us with the following paradox: the better the NMT translations are, the 
more difficult the error spotting is since the NMT output appears to be more 
fluent and less error-prone. This makes, on the one hand, the PE process even 
more demanding and leads to more cognitive effort for the post-editor. On the 
other hand, due to the absence of “real” errors, the post-editors tend to correct 
more style errors, which in turn leads to over-editing (for details see Vardaro et 
al. 2019). Hence, post-editors need a lot of training and awareness for the error 
types to be able to correct the texts efficiently. 
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3 MT history - how has machine translation developed? 


Crossword puzzle — chapter 3 


10 


Across 1 Who wrote the memorandum titled “Translation” published in 1949? 2 Which univer- 
sity hosted the first public demonstration of an MT system in January 1954? 3 Which languages 
were involved in the first public demonstration of an MT system in January 1954? English and... 4 
What was the name (abbreviation) of the report that ended funding of MT research in the United 
States for about twenty years? 5 What is the name of one of the oldest MT companies that was 
founded by Peter Tome? 7 What was the name of the first free online machine translation that 
was launched in December 1997? 

Down 6 What was translated by the Météo system? 8 With what kind of approach did Google 
launch their first online MT system? 9 What is the name of a language that represents meaning 


in an abstract form? 10 What is the state-of-the-art MT approach? ... MT 
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4 Post-editing guidelines — how to 
post-edit? 


Learning objectives 


You will learn... 


e about different guidelines, especially light and full PE, 
e about monolingual PE and the associated problems, 
+ what is included in the PE Norm, 


e how to adhere to different guidelines. 


Every translation and thus also every PE project has different requirements. 
Especially in PE, the required target text quality might vary a lot, depending 
on various factors like reader, distribution, or duration of use. Hence, the effort 
invested in the PE process might differ and how much effort should be invested 
while post-editing a text has to be defined in advance. 


4.1 Considerations on PE guidelines 


You might wonder why we need guidelines for PE. PE is not per se intended to 
generate perfect, high quality texts. The main goal is to save time and money. 
Therefore, it is very important to define the final quality criteria and editing 
needs in advance. You have to ask yourself different questions to judge the re- 
quired target text quality and thus the PE effort that needs to be invested. As 
these decisions are usually not made by the post-editors themselves, we will dis- 
cuss the decision making process later in §8. For now, we have to keep in mind 
that we need PE guidelines to achieve the defined target text quality, which 


4 Post-editing guidelines - how to post-edit? 


should ideally be communicated with the post-editing job. In addition, guide- 
lines help us to make the target text as consistent as possible even if different 
post-editors work on the same text - similar to the processes in technical docu- 
mentation or other domain-specific translations. 

In general, we differentiate between light vs. full PE. This differentiation is 
quite superficial for real PE projects as discussed in Nunziatini & Marg (2020). 
The guidelines might differ a lot from project to project and every PE project 
might focus on different quality aspects, e.g. using the correct, pre-defined ter- 
minology might be far more important in technical documentation than in post- 
editing a newspaper article. However, the differentiation between light and full 
PE is well-established and will give you an impression of the continuum in which 
we are working. 

The guidelines we want to introduce to you were established by TAUS (Trans- 
lation Automation User Society).! The society was established in 2005. It is an 
independent organisation that is concerned with automation and innovation in 
translation. 


4.1.1 Light PE 


The first set of guidelines we want to talk about are what TAUS calls the guide- 
lines for achieving “good enough” quality.? This equals what is generally consid- 
ered light PE. 

The criteria for “good enough” are defined by TAUS as follows: 


e comprehensible: the contents of the text should be comprehensible 
e accurate: the meaning of the source text should be preserved 


e but stylistic quality plays a minor role 


This means that the text may appear unidiomatic and unnatural as it is gener- 
ated by a computer. The grammar and syntax can be incorrect as long as the 
meaning is comprehensible. Concerning the guidelines for light post-editing, 
TAUS puts it this way: 


e Aim for semantically correct translation. 


!Find the guidelines and further recommendations on their webpage, last accessed on 28 April 
2021. 
*See also the discussions under the buzzword ”fit-for-purpose translation”, e.g. Bowker (2020a). 
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4.1 Considerations on PE guidelines 


Ensure that no information has been accidentally added or omitted. 


Edit any offensive, inappropriate or culturally unacceptable content. 


Use as much of the raw MT output as possible. 


Basic rules regarding spelling apply. 


No need to implement corrections that are of a stylistic nature only. 


No need to restructure sentences solely to improve the natural flow of the 
text. 


The greatest challenge in light PE for most professional translators is leaving 
incorrect grammar and syntax unedited as we are usually used to creating high- 
quality translations. Keep in mind that you will not be paid for those corrections 
and try to remind yourself that for this job high quality is not needed. Light PE 
requires training and it will become easier to adhere to the guidelines after a 
while. 


4.1.2 Full PE 


Another set of guidelines are the guidelines for achieving quality that is similar 
or equal to human translation, often called full post-editing. Besides being com- 
prehensible and accurate (see above), stylistic quality is also important for full 
PE. However, it may still not be as good as it would be when translated from 
scratch. Syntax, grammar and punctuation need to be correct. Concerning the 
guidelines for full post-editing, TAUS puts it this way: 


Aim for grammatically, syntactically and semantically correct translation. 


Ensure that key terminology is correctly translated and that untranslated 
terms belong to the client’s list of “Do Not Translate” terms. 


Ensure that no information has been accidentally added or omitted. 


Edit any offensive, inappropriate or culturally unacceptable content. 


Use as much of the raw MT output as possible. 


Basic rules regarding spelling, punctuation and hyphenation apply. 


Ensure that formatting is correct. 
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4 Post-editing guidelines — how to post-edit? 


Full PE is more in line with the quality standards most professional translators 
are used to. However, it is still important to remember to use as much of the raw 
MT output as possible and to not get too lost in the fine-tuning. 


4.1.3 Monolingual PE 


Especially when talking to laypersons, the argument might come up that a post- 
editor only needs to be fluent in the target language, because the translation was 
created by the machine and the fine-tuning of the MT output takes place in the 
target text. However, think about whether you as a professional translator would 
be willing to revise a translation for which you do not know the source language 
or for which you do not have access to the source text. You would probably be 
reluctant to do so, because you know that many mistakes cannot be identified 
without the source text, especially content mistakes. And, as you can probably 
guess, this is also true for post-editing MT output - maybe even more so. 

Bilingual PE involves the comparison of source and target text. This means 
that the post-editor has to check the quality of the translation but he/she also has 
to assess whether the adequate meaning of the source text has been transferred 
by the MT system. In contrast, monolingual PE (MPE) suggests that the quality 
control of the translation can be carried out without taking the source text into 
account. The professional translation market and research on PE in translation 
studies hardly discuss MPE, but you should be aware that the option of MPE 
might come up when negotiating with clients. To have some arguments prepared, 
we want to discuss the topic briefly and present some findings of research on 
monolingual PE. 

Mitchell et al. (2013) showed in their study that monolingually post-edited sen- 
tences are most often rated as an improvement to the pure MT output. However, 
there are also a number of incidents where the monolingual PE processes even 
negatively affected the final quality. This shows that monolingual PE mostly has 
a positive or neutral effect on the MT output, but it does not say whether the 
final product was good or even acceptable — it was merely better than the pure 
MT output. 

In her study, Nitzke (2016) compared the translation and PE products of 24 
participants (twelve professionals, twelve students) for six texts that were trans- 
lated/ post-edited from English to German. She found that “superficial” mistakes 
like grammar, spelling, punctuation, etc. occurred similarly often in the final tar- 
get texts for all three tasks (translation, bilingual PE, monolingual PE). However, 
content mistakes could be found much more frequently in the monolingually 
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post-edited final texts than when the texts were translated from scratch or bilin- 
gually post-edited. However, we have to keep in mind that both studies used MT 
output of statistical MT systems. 

When we talk about NMT, the picture is the same but different. Studies (e.g. 
Burchardt et al. 2017 or Macketanz et al. 2017) showed that NMT generated trans- 
lations still contain many errors, although it might seem that the quality of the 
MT output has improved for some language pairs as the output can be read more 
fluently. However, we assume that it might even become more difficult to find 
certain error types, especially content errors, even in bilingual PE, because the 
MT output is fluent and seems to be correct. Hence, the content mistakes are, in 
a way, more hidden. 

Even though there might be some scenarios where monolingual PE may be 
sufficient or better than no PE at all (similar to how proof-reading a target text 
instead of revising it against its source text might be better than no quality assess- 
ment at all), we discourage you from engaging in monolingual PE in professional 
PE settings as some mistakes in the MT output cannot be found without consul- 
tation of the source text. Since monolingual PE neglects the assessment of the 
equivalence relations between source and target text, the adequacy of the mean- 
ing of the MT output cannot be evaluated at all. This therefore means that it is not 
possible to evaluate whether the output is accurate, i.e. whether the meaning of 
source and target text is identical. This is why we strongly recommend refusing 
jobs involving solely monolingual PE. The adherence to given standards (§4) is 
not possible at all for this kind of task and questions concerning risk assessment 
and liability (§7) cannot be addressed at all. 


4.2 ISO 18587 — the post editing standard 


In addition to the general guidelines, there is also an ISO standard addressing pro- 
fessional PE workflows. The PE standard ISO 18587 is called “Translation services 
— Post-editing of machine translation output — Requirements.” After having in- 
troduced the differences between light and full PE, we will now focus on this 
international standard and its requirements. We refer to the first edition which 
was published in April 2017. 

The standard first discusses the reasons and advantages of post-editing ma- 
chine translation: Translation costs can be decreased, the launch of products or 
the flow of information can be accelerated, translation productivity as well as the 
turn-around times can be improved, and translation service providers can remain 
competitive in a globalised world. Additionally, MT gives clients the possibility 
to translate material that could otherwise not be translated at all. 
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4 Post-editing guidelines - how to post-edit? 


“However, there is no MT system with an output which can be qualified as 
equal to the output of human translation and, therefore, the final quality of 
the translation output still depends on human translators and, for this pur- 
pose, their competence in post-editing.” (DIN ISO 18587 2018: Introduction) 


This is an important statement that can help to regulate expectations of what 
MT and PE can and should deliver. The ISO standard does not apply to gen- 
eral MT developments or general translation processes, but only to PE processes. 
Further, the scope of the standard is that it refers to “the process of full, human 
post-editing of machine translation output and post-editors’ competences”. This 
means that it does not apply to light PE jobs, to automatic PE, to interactive PE, 
or to monolingual PE. 

The standard coins its own definition of PE: “Post-editing is performed on MT 
output for the purpose of checking its accuracy and comprehensibility, improv- 
ing the text, making the text more readable, and correcting errors. PE differs 
from a regular translation because it comprises three texts that need to be pro- 
cessed and not only two, namely the source text, the MT output, and the final 
target text. The two main varieties of PE, which we have already mentioned, are 
light and full PE. The standard also coins definitions for full and light PE: Full 
PE is defined as the “process of post-editing to obtain a product comparable to 
a product obtained by human translation.’ In comparison, light PE is defined as 
the “process of post-editing to obtain a merely comprehensible text without any 
attempt to produce a product comparable to a product obtained by human trans- 
lation” The decision which type of PE is needed depends on the purpose and 
requirements of the final text, the PE brief and the client. 

Three actors are assumed in the PE process according to the standard: the 
client, the translation service provider (TSP), and the post-editor. The focus of 
the standard is on the role of the translation service provider. However, the other 
roles are defined indirectly as well. According to the standard, the translation 
service provider has to determine whether the source text content is suitable for 
MT and accordingly for PEj The efficiency depends on the kind of MT system, 
the languages, the domain and the text type. The TSP has to decide whether 
pre-editing is reasonable before the text is machine-translated. Further, relevant 
specifications have to be communicated to the post-editor such as who the target 
text readers are or what quality level is aimed for in the target text. The TSP has 
to assure that the source content is in an appropriate format so that the post- 
editor can access it as well as any reference material or other resources. The TSP 


3We will also discuss this topic in 88. 
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shall inform the post-editor about how useful the MT output is expected to be. 
Similar to the TAUS guidelines, the standard defines the following aims for the 
post-editing process: 


e the post-edited MT output must be comprehensible 


e the content in the source text must correspond to the content in the target 
language 


e the post-editor must comply with the agreed requirements and specifica- 
tions 


The final target text should meet the following requirements (remember that 
the standard focuses on full PE!): The terminology must be consistent and com- 
ply with domain-specific requirements. Syntax, spelling, punctuation and other 
orthographic characteristics as well as formatting must be correct. If applicable, 
specification according to relevant standards must be satisfied. The post-editor 
must consider the target audience and the target purpose of the final text. Typi- 
cally, the client and the translation service provider have agreed upon all require- 
ments in advance. 

The post-editor’s tasks should be to first evaluate whether the MT output 
needs any editing at all referring to the source text and then to provide a tar- 
get text either by using the existing machine-translated elements or by creating 
a new translation. Finally, the translation service provider should check the final 
text and deliver it to the client. The post-editor should be able to give feedback 
about the performance of the MT system, so that weaknesses are known and the 
system can be improved. 

The ISO standard describes six competences, which are mandatory for post- 
editors: The post-editor needs to have translation competences as well as linguis- 
tic and textual competences in the source and target languages. The post-editor 
must also be able to conduct efficient research to find and process information. 
Cultural competences are necessary to ensure that the target text audience un- 
derstands the final text. Further, the post-editor needs technical competences to 
be able to process the text using the appropriate tools. Finally, the post-editor 
must have knowledge of the domain that the text deals with.* The qualifications 
of a post-editor must be similar to those of a translator and should either include 
a degree in translation, another university degree and two years of full-time pro- 
fessional experience in translation or post-editing, or five years of full-time pro- 
fessional experience in translation or post-editing. 


“We will present our competence model in §9.1. As you will see, there are many similarities and 
also some differences and enhancements. 
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Last but not least, the standard proposes instructions for full PE tasks: The 
final text should be accurate, comprehensible, and stylistically acceptable. Gram- 
mar, syntax and punctuation should be correct. The aim is to create a final text 
that cannot be distinguished from a human translation. Nonetheless, the post- 
editor should use as much of the MT output as possible. The post-editor should 
concentrate on the following aspects while post-editing: He or she has to make 
sure that no information has been added or omitted. Any inappropriate content 
must be edited. Sentences should be restructured if the syntax is incorrect or if 
the meaning is not clear. As mentioned before, grammar, syntax and punctuation 
should be correct. The same applies for spelling, punctuation and hyphenation. 

In conclusion, we would like to emphasise that the aim of this chapter was 
to present an overview of the ISO standard for PE. For further details or if you 
are considering becoming certified or working according to the standard, we 
recommend buying and studying the standard. 
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Crossword puzzle — chapter 4 


5 2 


s= 
sn mis 


Across 1 The quality of light post-edited texts should be “good ..” 3 In both full and light PE, 
we should use as much of the raw MT ... as possible. 6 What kind of PE does the PE standard ISO 
18587 focus on? 

Down 2 In light PE, grammar and ... can be incorrect as long as the meaning is comprehensible. 


4 In full PE, key ... needs to be translated correctly. 5 What kind of PE only uses the MT output 
for PE? 
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5 MT and text types — which influence 
do they have? 


Learning objectives 


You will learn... 


e to assess which text types are more and which are less suitable for machine 
translation, 


e what a controlled language is and how it influences MT output. 


This chapter deals with the special characteristics of text types and their effect 
on MT and PE. This chapter will deal with text types only. The next chapter will 
focus on PE risks associated with each text that influence the decision whether 
to use MT and PE. Although considerations on text types are very important for 
decisions concerning the PE project, the basic principles and assumptions can be 
explained quickly. 

The text type is a very important factor to assess whether MT will be useful 
and effective for a given source text. Very creative texts are seldom considered 
suitable for MT because they require flexible translation solutions, variability 
and creativity. Poems would probably be one of the most demanding text types 
as content and form usually both play a very important role. They generally 
rely on rhythm and rhyme, which regular MT systems do not consider. When 
human translators translate poetry, they might have to translate the content very 
freely, which is usually also not possible in MT. Similarly, you will probably get 
unsuitable translations for slogans or certain advertising texts because they often 
rely on figurative speech or word plays. The concept transcreation has become 
increasingly popular to describe texts that need to be translated very freely (see 
e.g. Pedersen 2014). 


5 MT and text types - which influence do they have? 


Very restrictive, highly standardised, redundant and less creative texts on the 
other hand, are well-suited for MT. These are often domain-specific and adhere 
to pre-defined and strict text type conventions. Often, repetitions of style and 
terminology are desired, and the authors of the texts also have to follow certain 
rules when writing the text. Especially suitable for MT would hence be tech- 
nical manuals or instructions that may even have been written in a controlled 
language. 

A controlled language (CL) is a reduced version of a natural language that 
follows certain rules and guidelines. As the name already implies, the language 
use is controlled. Often, these controlled languages use restricted vocabulary, 
where each word has only one meaning and each meaning is only represented 
by one word. The sentences are usually short and do not include complex syn- 
tactic structures. The passive voice is often avoided and sometimes even the use 
of tenses is restricted. These are only some examples for CL rules. Finally, con- 
trolled languages are often used in technical documentation, but also in other 
domain-specific communication. “Simplified Technical English” is one quite fa- 
mous example of a controlled language (e.g. Knezevic 2015). If you want to know 
more about controlled languages, you can find more information in Kamprath 
et al. (1998) or Kittredge (2003). 

Using a CL in a MT workflow has proven to yield better MT quality. Aikawa 
et al. (2007), amongst others, show that the CL had a positive influence on PE 
productivity in three of four tested languages. In general, these improvements 
are especially measurable for rule-based and statistical MT systems. Marzouk & 
Hansen-Schirra (2019) published a first study on the influence of CL on different 
MT systems, including neural MT. The study shows that CL improves rule-based, 
statistical and hybrid MT; in contrast, it has little to no effect on neural MT since 
its quality is already exceptionally high for technical documentation, which was 
the text type under investigation. The study only tested a very limited set of 
rules. If the results can be applied to a more holistic rule-set, the study would 
imply that using a CL is obsolete when using neural MT. This will be the subject 
of future studies. 

Apart from using a CL, the assumptions for text types also presume that the 
MT systems were trained on the respective text types and domain. Logically, a 
system that was trained on legal texts and then has to translate medical texts 
would probably also produce a lower quality output. 

Finally, the use of MT and PE is also becoming more and more widespread 
within translation tasks which require creative translations. Toral et al. (2018), 
for instance, showed that neural MT also provides valuable results for literary 
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translation. Another example is the CompAsS (Computer-Assisted Subtitling)! 
project, which aimed at researching and optimising the overall multilingual sub- 
titling process for public TV programs by developing a multi-modal subtitling 
platform combining automatic-speech-recognition, neural MT and translation 
management tools. The project results are promising since they prove signifi- 
cant gains in productivity while maintaining acceptable quality standards (Tardel 
2020). We should however keep in mind that subtitles are regarded as creative 
texts, but they are also restricted and controlled since they follow certain rules 
and standards. 

Nonetheless, MT is still considered more suitable for domain-specific, restricted 
texts than for creative texts. One general rule of thumb would be that if texts are 
suitable for translation memory (TM) systems, they might also be suitable for 
MT systems. In state-of-the-art translation work benches, they are even com- 
bined - i.e. MT candidates are suggested when there are no matches or when 
the TM fuzzy matches fall under a pre-defined threshold. So, if you are not con- 
sidering using a TM - a possible case might be the translation of a novel or an 
advertisement — you should not use an MT system either. We will talk about the 
interaction of MT, PE and other tools in the next section (§6). 


Ihttps://www.compass-subtitling.com, last accessed 19/06/2021 
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5 MT and text types - which influence do they have? 


Crossword puzzle — chapter 5 


4 


Across 1 A ... language is a reduced version of a natural language that follows certain rules and 
guidelines. 2 What characteristic makes source texts less suitable for MT? 3 Very ..., standardised, 
and redundant texts are better suited for machine translations. 

Down 4 For what kind of audio-visual translation have fist studies tested the use of MT and 
PE? 
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6 Post-editing and tools — how do they 
interact? 


Learning objectives 


You will learn... 


« what a translation memory system is (if you don’t already know), 
« how to post-edit in translation memory systems, 


e what adaptive and interactive MT is. 


This chapter will present how MT output can be integrated in professional 
translation practice, especially concerning the integration in translation mem- 
ory systems. Translation memory (TM) systems are essential tools in professional 
translation workflows, often including translation memory, project management, 
and terminology management components. First, we will give a short introduc- 
tion to TM systems §6.1. However, as most of you will already have worked with 
TM systems, this introduction will be very basic and you might consider skip- 
ping it. In §6.2, we will discuss working in a translation memory system with 
MT output. And finally, in §6.3, we will talk about the latest technological devel- 
opments for using MT for PE processes. As the field is constantly evolving, new 
tools and functionalities are being developed to increase productivity and user- 


friendliness. Hence, we will discuss some examples, e.g. adaptive and interactive 
MT. 


6.1 Introduction to translation memory systems 


This will be a short introduction to translation memories and their basic function- 
ality. Most TM system vendors provide online tutorials nowadays if you want to 


6 Post-editing and tools - how do they interact? 


learn about a new tool. Further, most TM systems work on the same principles, 
which usually makes it quite easy to use a new tool if you are already familiar 
with another one. 

So, what is a translation memory system? Basically, it saves the translations 
of texts, i.e., it is a database of previous translations. Usually, the source texts are 
segmented on a sentence basis. You can translate the text segment by segment 
and every segment is saved. Further, every source segment is compared to the 
segments that have already been translated. If the same (100% or full matches) 
or a similar (fuzzy matches) segment appears, the previous translation is pre- 
sented to you and you can decide whether you want to use the same translation, 
how much you need to edit, or whether you want to translate the segment from 
scratch. Translation memory systems can often be combined with terminology 
management systems, dictionaries, MT systems, and/or other helpful tools. 

There are different ways to build a translation memory. Of course, you can 
build a TM with your own translations - either in one big database or you can 
save your translations thematically according to text type, domain, client, etc. 
You can import existing databases, e.g. clients often deliver a translation memory 
containing the translations of other translators. Finally, you can align existing 
translations and add them to your translation memory. The parallel texts should 
be available electronically. You can upload the source text and the target text, 
which are then automatically segmented on a sentence level and aligned. How- 
ever, the resulting alignments are sometimes error-prone and need some manual 
corrections, e.g. if one source sentence is translated with two target sentences. 

There are many reasons why translators should use translation memory sys- 
tems. The translations are stored efficiently and it is much easier to recall what 
has been translated before. In other words, the work that has been done before 
can be reused easily. Accordingly, the matching process is efficient, too. We not 
only retrieve what appeared in exactly the same phrasing, but also what is similar 
to the current sentence or phrase. To put it in a nutshell, redundant work is re- 
duced, which saves time and makes the translation process more efficient. Due to 
special search functions, linguistic features can be used more consistently, which 
increases the quality in most text domains. It is unlikely that you will skip a seg- 
ment during the translation as the texts are preprocessed. Further, the systems 
already extract the texts that need translation. Hence, it is unlikely to damage 
the file, e.g. in HTML or XML files. 

Working with translation memory systems can also have some disadvantages. 
It might not be advisable for every text type to have a consistent, repetitive style, 
especially if the texts are very creative. Another disadvantage is that you proba- 
bly only read the text segment by segment. If you have a lot of exact and fuzzy 
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matches, you might not understand the context of the individual sentences cor- 
rectly and minor or major errors can occur. All in all, the advantages usually 
outweigh the disadvantages, especially for domain-specific text types. 

There are many translation memory suppliers and their systems might have 
different pros and cons, but their basic functionality is the same. At this point, 
we do not want to recommend any system in particular. For most systems, you 
have to buy a license or the software, but there are also some free tools, which 
usually do not have many advanced functionalities. Nonetheless, they do the job. 
You might also find a free test version or free test days for some systems. You will 
probably have to decide which is the best tool for you in the long run - or your 
clients, who might use a particular system themselves. Hence, it often makes 
sense to be flexible and be able to use different systems. 


6.2 Machine translation in translation memory systems 


This chapter will present how MT is integrated and presented in different tools. 
Of course, every system is (at least) a little different, but usually the concepts are 
the same or similar across tools. Thus, you can transfer what you learn here to 
the tools you use in your everyday life or will use in future. 

First, you have to know how MT is integrated or can be activated and deac- 
tivated in the respective translation memory system. The MT can be activated 
right away - in this scenario you should check what kind of MT system is acti- 
vated and if it is suitable for your project (concerning quality, data security, etc.) 
— or you have to activate the MT component manually. The systems often offer 
different MT implementations in the standard settings. However, it is usually 
also possible to download or purchase other MT systems for the respective TM 
system. 

After you have selected an MT system for your project, the MT output will 
often be automatically inserted into empty segments, i.e. segments without full 
or fuzzy matches from the TM storage. For the latter, you can even define the 
threshold at which MT suggestions might replace fuzzy matches. However, the 
MT suggestions might also be presented in addition to the different TM matches. 
You can, accordingly, use the MT output as an additional option, insert it into the 
segment (which often happens automatically), and post-edit it. When the post- 
edited segment is confirmed, it is added to the translation memory. 

So, the basic process is very similar to what we already know from transla- 
tion memory use in translation from scratch. Some tools also provide additional 
functionalities to measure, for example, PE effort. The plug-in “Qualitivity” for 
Trados Studio helps the user to 
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track the time spent on translating, reviewing & post-editing the segments 
from documents, but additionally [it] includes functionality to track every 
single change made to the segments at a granular level and a means to gen- 
erate reports based on that data in structured and readable format. (commu- 
nity.sdl.com, last accessed 31 May 2021) 


In the next section, we will talk about new and more advanced approaches to 
integrating MT. 


6.3 New approaches 


Integrating MT output into TM environments has not been the only advance- 
ment in recent years. To make the PE task less repetitive, approaches towards 
interactive and adaptive MT have been developed. 


An interactive system tries to autocomplete the text the user is going to 
type; it either predicts the text the user is going to type or changes the MT 
suggestion on the basis of what is typed, whereas an adaptive system is 
an MT system that learns from corrections on the fly and is continuously 
trained. (Daems & Macken 2019: 118) 


In other words, interactive MT systems change their suggestions while the seg- 
ment is post-edited, whereas an adaptive system learns in the background and 
adapts to the post-editors changes. 

Daems & Macken (2019) investigated how these different modes influence the 
PE process. They used the commercial tool LILT ! which integrates both interac- 
tive and adaptive MT, but also presents TM matches. Compared to a traditional 
TM system, the use of MT output is a strong focus in the LILT environment. The 
study was conducted in two rounds - the first using statistical MT, the second 
using neural MT. Eight professional translators (four per round) participated in 
the experiment. They worked as Dutch-English translators, which was also the 
language pair investigated in the study. The study found that there was hardly a 
difference between SMT and NMT concerning post-editing time and effort (mea- 
sured via keystrokes and mouse clicks), although the initial SMT output produced 
more errors. Deams and Macken argue that this might be caused by the interactiv- 
ity and adaptivity of the MT systems, the kind of errors produced by the different 
systems, and individual behaviour of the post-editors. Further, they studied the 


!Lilt Website, last accessed 8 March 2021 
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whole translation process, which also included fuzzy and full matches from the 
TM component. 

The CasMaCat project (Alabau et al. 2014) provided a TM environment spe- 
cialised on PE processes with an integrated interactive SMT system. Sanchis- 
Trilles et al. (2014) tested how the interactivity influenced the PE process. They 
asked nine freelance translators to full post-edit nine newspaper texts from En- 
glish into Spanish (the latter was their native language). PE was done in three 
modes, three texts were conventionally post-edited, i.e. without interactivity, and 
interactive systems (basic vs. advanced) were provided for the other six texts. 
All participants were introduced to the CasMaCat environment and the different 
modes before the experiment. Finally, four reviewers were asked to revise one 
dataset (consisting of one text from each mode by all participants). The results 
show that the participants did not become faster in the interactive modes, but 
in the basic interactive systems, it took fewer keystrokes to post-edit the texts 
and the post-editors were only a little slower. The quality of the final output 
also did not significantly change. After the sessions, the users were asked to give 
feedback on how satisfied they were with their PE results, how much they liked 
the tools, whether they would rather have translated from scratch, and whether 
they would have preferred to work without interactive MT. The results are very 
mixed and do not present a clear picture. Taking into perspective that CasMaCat 
used SMT systems, and the results might be different for modern NMT systems 
(the results of Peris & Casacuberta 2019 underline this hypothesis), we can take 
from this study that interactivity might not be the perfect solution for every post- 
editor and that it might take some time until you are used to the interactivity and 
can use it properly. This disadvantage does not occur in adaptive systems as the 
learning process runs in the background. 

Moorkens & O’Brien (2017) conducted a questionnaire study to survey what 
PE functionalities users expected from their TM environment. In total, 81% of 
the participants who replied would like to have a confidence score indicating the 
expected quality of the MT output.” Some TM environments already offer this 
kind of evaluation. Generally, these scores can help you accelerate the decision 
about how useful the MT output is and whether to translate from scratch or 
use the MT output. However, you have to keep in mind that those numbers are 
calculated automatically and that the process is more complex than the fuzzy 
match calculation. Hence, we would advise that you do not blindly trust a good 


? Around 70% of the participants also declared their interest in interactive and adaptive MT. 
3eg. memsource, _ https://help.memsource.com/hc/en-us/articles/360012527380-Machine- 
Translation-Quality-Estimation, last accessed 11 March 2021 
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quality estimation, but - similarly as with most full matches - recheck the MT 
output. 

As PE has become an established task on the translation market, the tools and 
TM environments will further develop, and the TM environments will potentially 
increasingly adapt to the PE tasks. Hence, we can only advise that you try to keep 
up to date and inform yourself about new developments. 
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Crossword puzzle - chapter 6 


3 7 4 


Across 2 Translation Memory Systems often contain a component for project ... 5 When an MT 
system is activated, the MT output is often automatically inserted into ... segments, i.e. segments 
without matches from the translation memory. 6 What do we call an MT system that changes the 
MT suggestion according to what is being typed while the translator starts post-editing? 

Down 1 On what level do translation memory systems usually store translations? 3 What else 
can we manage on a word-level with translation memory systems? 4 What do we call results from 
the translation memory that are not 100% equal to the current segment? 7 What do we call an MT 


system that learns from the post-edited segments? 
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7 Post-editing risks and data security — 
which pitfalls can arise? 


Learning objectives 


You will learn... 


e what risks can arise in translation and post-editing, 


e what to keep in mind concerning data security when using MT systems. 


When we talk about PE, we also have to think about possible risks and security 
concerns. In this chapter, we want to outline the most important considerations, 
so you know what you have to keep in mind when you start working as a pro- 
fessional post-editor. 


7.1 Post-editing risks assessment 


Translating texts generates risks for all actors involved in the translation process 
(Canfora & Ottmann 2015 or Canfora & Ottmann 2018). Although translation con- 
tains specific creative and cognitive aspects that alone can be the research focus 
of many scientific studies (Pym & Matsushita 2018), decisions made during the en- 
tire translation process are underpinned by the same principles as the decisions 
on any other business level. Therefore, these decisions should be made in an eco- 
nomic framework. One instrument to develop decision criteria in economics is 
risk management. Generally, business decision criteria can be differentiated be- 
tween strategic (long-term), tactical (medium-term), and operative (short-term) 
decisions (Hofmann 2012). When considering risk management for a PE situa- 
tion, the following business decisions are of special importance: 


+ strategic, e.g. if the organisation wants to use MT at all 


7 Post-editing risks and data security — which pitfalls can arise? 


e operative, e.g. what PE guidelines - full vs. light - are necessary for the 
specific text or the respective text type in regard to the organisation’s gen- 
eral strategic decisions 


The international standard ISO 31000 (2009) “Risk management - Principles 
and guidelines” can be used for the translation process in all contexts, because 
it is a horizontal standard. Risk management is considered an integral part of all 
processes in an organisation including translation processes (either in-house or 
as part of a supply chain risk management). In addition to the risks that emerge 
from translation in general, the use of MT and PE generates risk factors in par- 
ticular, such as: 


data breach: Confidential information are fed into a web-based MT sys- 
tem and end up on the web (as in the case of Statoil, cf. CSO Online, last 
accessed 20 August 2021). 


loss of control of processes: The clients cannot control whether the trans- 
lator uses MT or the functionality of the MT system is not transparent to 
the user of the MT system at all, especially with neural MT. 


uncertain liability modalities: In cases of translation errors or problems, the 
responsibilities concerning liability are not clearly defined. This especially 
affects the use of MT and PE for high risk texts. In cases of claims for 
compensation where translation mistakes cause danger to life and limb, 
the client might partially be blamed. 


attitude towards MT: Clients might have difficulties in finding qualified 
translators and post-editors, because professional translators might still 
have prejudices against MT and PE (e.g. Cadwell et al. 2018 ; Guerberof 
Arenas 2013; Laubli & Orrego-Carmona 2017). 


quality issues: The quality of the post-edited text might not be sufficient 
for the purposes of the client or target group. 


Basically, the client should consider whether the benefits outweigh the risks 
before using MT and PE. Or, in other words, the client has to decide whether 
the risks are tolerable in a given situation. This includes general considerations 
arising from the client’s own “risk management policy” (cf. IRM 20021). In the 
context of ISO 31000, the term risk management policy describes a “statement 


‘last accessed 20 April 2021 
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ofthe overall intentions and direction of an organization related to risk manage- 
ment” (ISO Guide 73:2009°). The general risk attitude of the organisation is an 
important factor in creating the risk management policy, which describes the “or- 
ganization’s approach to assess and eventually pursue, retain, take or turn away 
from risk” (ISO Guide 73, term 3.7.1.1). Accordingly, an organisation can be more 
or less willing to take risks, and this so called “risk appetite” influences strategic 
decisions. An organisation with a higher appetite for risks is more willing to take 
the risks mentioned above than a risk adverse organisation. These decisions are 
usually made on a long-term basis and therefore usually concern the strategic 
part of business management (cf. IRM 2002). 

On the operative level, risk management can provide decision criteria for or 
against the use of MT and PE for certain text types. Therefore, the approach to 
risk management for translations can also be used for decision processes in MT 
and PE (Canfora & Ottmann 2018). This means that the potential risks must be 
identified before the actual translation process to foresee problems that might 
affect different actors involved in the translation process such as the translator, 
the TSP (Translation Service Provider), the client, the end user or any other agent. 
This initial analysis should consider the negative consequences of failures in the 
translation, such as impaired communication, loss of reputation, property dam- 
age, lawsuits or other legal consequences, injuries, which could even amount to 
danger to life and limb, etc. Afterwards, the likelihood that these risks could oc- 
cur in each case and the priorities of the client regarding the translation risks 
need to be analysed in compliance with the strategic risk management policy. 
This means that the client or the project manager has to decide which transla- 
tion risks must be avoided and which can be tolerated. Therefore, it is sensible 
to create different categories (e.g. very high-risk, high-risk, and low risk docu- 
ments) and to categorise the source text documents according to the risk analysis 
and risk evaluation (Canfora & Ottmann 2018). In line with these categories, dif- 
ferent processes can be shaped for the use of MT and PE. Hence, low-risk texts, 
for example, could be machine-translated with subsequent light PE or even with- 
out PE. High-risk texts require full PE so that a balance is created between risk 
considerations and the advantages of MT and PE. For very high-risk texts, the 
client has to evaluate whether a combination of MT, full PE, and additional qual- 
ity control measures like revision ensure the necessary quality. If this is not the 
case, MT might have to be entirely disregarded for those text types because the 
risks are too high. Furthermore, it has to be assessed whether it is still more 
efficient to combine MT, full PE and additional quality measures. Maybe a trans- 
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lation workflow with only human translation would provide more security and 
higher productivity and reduce costs in the end. 

For more details on risks in PE and sustainable workflows for NMT read Can- 
fora & Ottmann (2020), who isolate three main risks for NMT: possible damages 
to clients and customers, liability issues, and cyber risks. 


7.2 Post-editing and data security 


Data security is a very important issue when using MT, because not all systems 
protect your texts and data. If an in-house MT system is used, security consid- 
erations are less problematic because the texts that are fed into the system are 
safely stored on an internal system or server. Still, it might be reasonable to assess 
who can access the server and the MT system. Further, users of the MT system 
should be informed about confidentiality issues, especially when working with 
externals and freelancers. The same holds true for cloud-based systems, which 
typically use secure encoding. However, if an external system without secure 
encoding and/or a free online system is chosen, the source text is often saved on 
the provider’s server and hence might become accessible to third parties. This 
can be unproblematic, e.g. if we are dealing with the translation of a website and 
this text will be publicly available anyway. However, if the data are sensitive, MT 
systems that do not provide a safe environment must be avoided (cf. e.g. Kamocki 
et al. 2015). 

The German company DeepL, which provides MT systems, clearly differenti- 
ates between the free and the paid versions of their service. Regarding free use, 
they state 


When you use our translation service, only enter texts that you are will- 
ing to transfer to our server. Transferring the texts is necessary to offer our 
service and conduct the translation. We process your texts and their transla- 
tions for a limited amount of time to train and improve our neural networks 
and translation algorithms. 


If you edit our translation suggestions, these edits will also be transferred 
to our servers to check the edit for correctness and possibly update the 
translated text according to your corrections. We also save your edits for a 
limited amount of time to train and improve our translation algorithm. 


Please note that you must not use our translation services for texts contain- 
ing any kind of personal data. 2 


®https://www.deepl.com/privacy, last accessed 20 April 2021 
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When it comes to the paid cloud version, DeepL has a much securer policy 


When you use DeepL Pro, your submitted texts or documents will not be 
stored permanently, but only as long as it takes to create and transmit the 
translation. After transmitting the translation to you, both the texts or doc- 
uments you submitted as well as their translations will be deleted. When 
you use DeepL Pro, we do not use your texts to improve the quality of our 
service. [...] 


Please note that you can use DeepL Pro only for texts containing any kind 
of personal data if you have a job processing agreement with us [...].* 


This also means that you should never machine translate the text you get from 
your clients without their permission, especially if you want to use an online MT 
system as the data will probably be stored by the MT system. Or as Kamocki et al. 
(2015: 15) summarise for general use 


Private users should consider translating only those bits of texts that do 
not contain any information relating to third parties (which in practice may 
limit them to translating text into, and not from, a different language to their 
own). Businesses in particular may find such a limitation rather constricting 
and to protect their own data and the data of their clients, may prefer to opt 
for a payable offline MT tool instead of a ‘free’ online service. 


Original text:“Wenn Sie unseren Ubersetzungsservice nutzen, geben Sie nur Texte ein, die 
Sie auf unsere Server übertragen wollen. Die Übermittlung dieser Texte ist notwendig, damit 
wir die Übersetzung durchführen und Ihnen unseren Service anbieten können. Wir verarbeiten 
Ihre Texte und die Übersetzung für einen begrenzten Zeitraum, um unsere neuronalen Netze 
und Übersetzungsalgorithmen zu trainieren und zu verbessern. 

Wenn Sie Korrekturen an unseren Übersetzungsvorschlägen vornehmen, werden diese Ko- 
rrekturen auch an unseren Server weitergeleitet, um die Korrekturen auf Richtigkeit zu über- 
prüfen und gegebenenfalls den übersetzten Text entsprechend Ihren Änderungen zu aktual- 
isieren. Wir speichern auch Ihre Korrekturen für einen begrenzten Zeitraum, um unseren 
Übersetzungsalgorithmus zu trainieren und zu verbessern. 

Bitte beachten Sie, dass Sie unseren Übersetzungsservice nicht für Texte mit personenbezo- 
genen Daten jeglicher Art nutzen dürfen? 

“https://www.deepl.com/privacy, last accessed 20 April 2021 

Original text: “Bei der Verwendung von DeepL Pro werden die von Ihnen eingereichten 
Texte oder Dokumente nicht dauerhaft gespeichert und nur vorübergehend vorgehalten, 
soweit dies für die Erstellung und Übertragung der Übersetzung notwendig ist. Nach der Über- 
tragung der Übersetzung an Sie werden sowohl die eingereichten Texte oder Dokumente als 
auch deren Übersetzungen gelöscht. Bei der Verwendung von DeepL Pro verwenden wir Ihre 
Texte nicht, um die Qualität unserer Dienstleistungen zu verbessern. [...] 

Bitte beachten Sie, dass Sie DeepL Pro grundsätzlich nur für Texte nutzen dürfen, die perso- 
nenbezogene Daten jeglicher Art enthalten, wenn Sie mit uns eine Auftragsverarbeitungsvere- 


> 


inbarung abgeschlossen haben [...]. 
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As is the case in many other AI and computational linguistic features, using 
MT has become so common, especially as it is often implemented on webpages 
that we might become a little careless. So, we can only advise you to think care- 
fully before you decide to use MT systems, especially in a professional context. 
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Crossword puzzle - chapter 7 


Across 1 On what business decision level must the kind of PE guidelines be decided? 5 Before 
a document is post-edited, the potential ... have to be analysed and evaluated. 

Down 2 On what business decision level must organisations decide whether or not to use MT? 
3 What kind of information should not be fed into a web-based MT system? 4 The responsibilities 


concerning... are not clearly defined for PE, yet. 
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Learning objectives 


You will learn... 


e which aspects are important, when deciding whether machine translation 
and post-editing can be used for a translation job, 


e how to assess these aspects, 


e who is responsible for these decisions. 


According to Hofmann (2012), the translation process consists of three steps: 
translation preparation, translation, and translation post-processing. Hör (2020) 
assigned the same steps to the PE process in her MA thesis. While the translation 
is performed by the MT system and the translation post-processing by the post- 
editor,! the PE preparation has to be conducted by the client, the project man- 
ager, or the post-editor him/herself. In this section, we will concentrate on this 
preparatory step where different decisions have to be made, most importantly 
whether MT can be used and to what extent PE is needed. The standard ISO 
18587 on PE already mentions this preparation step in the PE process: “[b]esides 
the general commercial aspects, there is also the question of whether the source 
text is actually suitable for MT” (Wallberg 2017: 150). However, the standard does 
not explain, how this question is answered. Further, we think it is not only the 
source text, but also other factors that contribute to the decision whether or not 
the project is suitable for PE and what PE guidelines are needed. Therefore, we 
want to present different criteria that can help customers and other decision 


‘Consider the changing balance: While in translation processes, the main focus is on the second 
step (translation), the main focus in PE shifts towards the last step (translation post processing). 
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makers, like project managers, decide whether MT can be implemented in the 
translation process, focusing first on the source text (§8.1), then on the choice of 
MT system (§8.2), and finally on the target text (§8.3). 


8.1 Text types, risk considerations, and data security 


First, as already advised by the PE standard (§4.2), we want to take a closer look 
at the source text. As you will remember, we have talked about the single com- 
ponents in previous chapters. Whether a source text is suitable for MT can be 
determined by its text type (§5), the risks associated with the text type (§7.1), and 
how sensitive the content(s) of the text(s) are (§7.2). 

As discussed in §5, it is usually advisable to use MT for text types that are not 
very creative, contain redundancies, and may have been created using specific 
guidelines and rules or even using a controlled language. Very creative texts such 
as different forms of literature, marketing texts, or slogans should generally be 
translated from scratch, because they also need creative translations that might 
differ greatly from the source text (e.g. in transcreation jobs, Pedersen 2014). Here, 
we would also like to mention that each text has individual characteristics, even 
if they belong to the same text type. Hence, texts from the same/similar text types 
might be more or less suitable for MT. 

MT output is often quite linear to the source text, which is often not desirable 
in creative texts. Accordingly, the PE effort might be too high. Further, it could be 
argued that PE might suppress creative translations. However, O’Brien (2012: 113) 
claims that “this [editing as a less creative task] is certainly open to debate - can 
we really argue that improving or correcting what an author has written is ‘less 
creative’ than translating another author’s words?” Restricted texts, like most 
domain-specific communication on the other hand, are in general more suitable 
for MT because linearity and similarity to the source text is acceptable and often 
even favoured. Hence, the question at the beginning of the potential PE job is 
whether MT can be used at all in the specific situation. Therefore, it must be 
assessed whether the text is suited for MT. As a rough rule of thumb, we suggest 
that if the texts seem suitable for processing with the help of translation memory 
software, they can also be processed by MT (Arnhold et al. 2017: 221-224). In 
recent years, the borders have become more and more blurry and it is becoming 
difficult to tell by the text type alone whether or not to use MT. At this point, we 
might already have to consider the purpose of the translation. To summarise: If 
we want to translate poems for publication, we probably should not use MT and 
PE, however if we have a very strict deadline to provide the subtitles for a new 
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episode of a very successful TV series, MT and PE might help us achieve that goal. 
Especially for subtitling, PE is becoming more and more common and has been 
researched more extensively than other branches. For example, the COMPASS 
project” has investigated PE in the subtitling process (Tardel 2020). Also, studies 
on the use of MT systems for literature have become a focus in research with 
promising results (Toral et al. 2018). 

Next, we have to consider whether the risk of using MT is manageable. Risk 
considerations as introduced in §7.1 present a reverse picture to the considera- 
tions on text types. As a general rule of thumb, text types that are very restricted 
and straightforward are often the ones that are the riskiest. Warning messages, 
for instance, are typically written in a specific, very restrictive, repetitive way 
so that they are understandable, clear and explicit. This is done because they are 
very high-risk texts, which do not allow for creativity. These text types might not 
be suitable for MT, either, because the risk of mistakes and misunderstandings 
is too high. Hence, we agree with Massey & Ehrensberger-Dow (2017: 303-304), 
that “[rlepetitive, controlled content such as user documentation and user in- 
terfaces will be increasingly covered by MT as it improves. However, marketing 
and brand content will remain the preserve of human translation”, only when we 
keep in mind that “[a]lthough routine translation work can and will increasingly 
be done by automated solutions such as NMT, the responsibility still lies with hu- 
mans to decide in each case whether the risks of mistranslations and other errors 
are ethically acceptable.” (Massey & Ehrensberger-Dow 2017: 309) Sometimes, it 
might be sensible to consider MT, full PE and a revision instance to assure that 
the final target text matches our quality requirements and risk considerations. 
For other texts, it might be sensible to disregard the use of MT at all. 

Finally, the decision maker has to assess whether the available MT system pro- 
tects the contents of the source text sufficiently §7.2. If the information covered 
in the source text is not at all sensitive or maybe even publicly available already, 
then the use of an open, free MT system might be suitable. However, if the con- 
tents of the source texts are sensitive, a secure MT system becomes necessary. 

In summary, the following three questions are relevant, when assessing whether 
the source text is suitable for MT and PE: 


+ Is the text type of the source text suitable for MT? 
e How high are the risks that come with the source text? 


« How sensitive are the information in my source text? Are they protected 
in the MT system? 


*https://www.compass-subtitling.com/, last accessed 19 Juni 2021 
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8.2 MT quality 


An essential factor in making a decision for or against MT and PE is the quality 
of the MT output. The quality can be influenced by various factors: 


e MT system, training data, the language pairs (e.g., close vs. distant lan- 
guages) and data security 


e Source text quality (including factors like source text defects and controlled 
languages) 


We have already discussed most aspects of the first point. The different MT 
systems and their benefits and disadvantages were presented in §3. The quality 
of the MT output is of course influenced by the training data. If there is sufficient 
high-quality training data for the language pair and the specific text type, the 
MT output will be as good as possible. Distant languages tend to be more difficult 
for MT systems (e.g. Alam 2013) as distant languages tend to have fewer linear 
translations. Finally, we discussed data security in §7.2. 

If the MT system is well-trained, the output becomes better and less effort is 
needed for PE. Bi- or multilingual corpora are necessary to train a data-driven 
MT system. These corpora need to contain well-aligned, high quality transla- 
tions. Moreover, the quality of the MT output improves if the engine is trained 
on domain-specific texts and on the respective text type (Gavrila & Vertan 2011). 
Therefore, it is advisable for companies and LSPs with a lot of multilingual texts 
to train their own systems, because the closer the training material is to the 
source texts, the more precise the MT output becomes. Hence, reliable bi- or 
multilingual data are needed to train the systems. Translation memory and term 
base data, for example, can be very profitable for this purpose, since they con- 
tain former translation solutions and terminology specifications. However, if not 
much in-house data are available in general, e.g. if a language has only been intro- 
duced for translation recently, or if the translation memory and term base data 
are not of high quality (e.g. the translation memories might contain raw transla- 
tions, because the revision of these translation was not performed in the tool), it 
might be necessary to rely on external corpora. Still, no or only very little data 
might be available for some language pairs. Hence, it might be problematic or 
even impossible to train a system. Furthermore, a decision needs to be made con- 
cerning what kind of MT system (statistical, hybrid, neural) can be trained and if 
the training is to be done in-house or outsourced to a supplier. If no in-house MT 
system can be used or trained because of technical or financial reasons, it might 
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be possible to use an external or free online MT system, although considerations 
regarding data security (§7) are vital when making such a decision (Kamocki et al. 
2015). 

Another aspect, we want to briefly discuss is pre-editing. If resources allow, 
texts for machine translation can be carefully pre-edited. Pre-editing is the pro- 
cess of tailoring the source text to better fit MT purposes, e.g. by using style 
guides and controlled terminology to improve the MT output. Pre-editing fo- 
cuses on modifying input sentences in order to prevent predictable problems 
usually encountered by machine translation systems. The aim of pre-editing is 
to improve the quality of the MT output, either in terms of comprehensibility or 
PE efficiency. As the name already implies, pre-editing is done before MT. There 
is often a tipping point at which resources are better spent on pre-editing than 
on PE, or vice versa. For example, pre-editing might be sensible if the source text 
is machine-translated and post-edited into many different target languages. 

Especially when SMT was still the state-of-the-art, source texts written in con- 
trolled languages were considered especially suitable for MT. In neural MT, con- 
trolled languages seem to have almost no influence on the quality of the MT 
output (see what we also discussed in §5). As a reminder, according to Ferlein & 
Hartge (2008: 39-41), controlled languages restrict natural languages according 
to pre-defined rules. They claim that the aim of controlled languages is to in- 
crease readability, translatability, and the reusability of texts by consistent, clear, 
and target-oriented writing. Thus, controlled languages are usually constructed 
for very specialised communication needs, i.e. domain-specific communication. 


Let us now focus on the quality and related characteristics of the source texts 
as these are also very influential on the final MT output. The quality of the MT 
output can easily be decreased if the source text is defective, very complex, or 
very inconsistent. 

Finally, let us talk about the influence of source text defects on the MT out- 
put. Source text (ST) quality plays an important role in every kind of translation 
process, but we will now focus on source text errors in MT and PE. Source text 
defects are quite common (Horn-Helf 1999: 162-210) as the original texts often 
have to be created under a lot of time pressure, which can lead to errors. The 
nature of these errors can be manifold: Starting with simple spelling mistakes or 
faulty punctuation, grammatical errors or structural errors (both on the micro- 
and macro-structure), to incorrect terminology, content-related errors and factu- 
ally incorrect content. These errors, of course, influence MT output to varying 
degrees. Depending on the MT engine, errors might be ignored and transferred, 
not translated at all, or corrected automatically, e.g. spelling mistakes. The job 
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of the post-editor is to recognise not only errors in the MT output but also ST 
defects and to act accordingly (similar to how translators have to act in the trans- 
lation process). 

Let us have a look at a few more concrete examples of how source text defects 
can impact MT and thus the PE process. 


ST (English): (1) Remove filter cover. (2) Replace filter element. (3) Replace 
cover. 


MT (German): (1) Entfernen Sie den Filterdeckel. (2) Ersetzen Sie das Fil- 
terelement. (3) Abdeckung wieder anbringen. 


MT (Spanish): (1) Retire la cubierta del filtro. (2) Vuelva a colocar el ele- 
mento filtrante. (3) Vuelva a colocar la cubierta.’ 


Here, we have an excerpt from a car manual with steps for exchanging a filter. 
The source text was taken from Schmitt (1999: 91). The source text uses the term 
filter cover and the hyperonym cover for the same object. The text itself remains 
ambiguous and not explicit in terms of whether the filter cover from the first 
step should be replaced or whether we are talking about another cover. It would 
be more consistent and typical of technical documentation to use the term filter 
cover twice. This would also adhere to most controlled language style guides. As 
a result, the MT output for German uses two different terms (Filterdeckel and 
Abdeckung, which is even more inconsistent than the source text. However, the 
MT system cannot be blamed for this inconsistency since the problem was caused 
by the source text. It is a rather simple example of how inconsistent terminology 
can lead to errors that might not be identified by the post-editor. 

Another error type we want to discuss are misspellings. Depending on the 
gravity of the typo some MT systems can handle these types of errors rather eas- 
ily because they use an automatic spell checker as commonly known from search 
engines. Based on word frequencies, the MT system automatically chooses the 
most probable candidate for the source text segment. Critical are typos that re- 
sult in other existing words. Then the MT system cannot disambiguate semantic 
meaning from the context. 

If you do find source text errors, you can either correct them or decide not to 
do so. Whatever you decide, make sure to kindly inform your client about these 
source text defects. If this is not possible in case of poor communication or time 
constraints you can always leave a comment. Be aware that text-specific subject 


3German and Spanish MT output generated by deepl.com on 17/12/2020. 
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knowledge is essential to prevent the transfer of content-related source text de- 
fects into the target text. The transfer of source text defects can be damaging to 
your reputation, even if it might be even harder to find source text defects in the 
PE task as the focus is less on the source text compared to the translation task 
(see e.g. results in Nitzke & Oster 2016). 


8.3 Turnaround time, life span of translations, and 
available resources 


We want to discuss three aspects in this section that focus on the creation and use 
of the target text. We have not yet considered these production-related aspects, 
but they are also very important for the decision for or against MT and PE. 

First, we need to consider how much time is available when we decide whether 
or not to use MT and PE. When releasing products on different markets, it is 
often essential that they are released (almost) simultaneously - this applies to 
technical products as much as to films or TV series. Therefore, translations are 
often needed very quickly. If the deadline is very tight, MT and PE might be the 
best solution because translation time can be reduced immensely (Carl et al. 2015 
or Nitzke & Oster 2016). However, the quality of the target text usually still needs 
to be very high. Often, you might get the impression that time and money are 
the leading aspects in post-editing jobs. However, as the quality aspects often 
(should) outweigh time pressure, time and money play a certain role, but should 
not, in our opinion, be the most important aspects for making a decision for 
or against MT and PE. The aspects we described in §8.1 and §8.2 should have 
priority over time and money considerations as the potential damage exceed the 
economic advantages. 

Further, the lifetime of the translated text might be an impacting factor. If the 
texts are only needed for a short time, because they will be updated or replaced 
soon, the effort for human translation might be too high. The same pertains to the 
quantity of translations. If huge amounts of texts have to be translated over and 
over again, because the texts are generated very fast, human translation might be 
too expensive or even not possible at all, because it would cause too much effort 
(Way 2013; Hu & Cadwell 2016). Possible scenarios might be the analysis of posts 
in discussion forums or messages to customer services. For the latter, it might not 
be possible to have personnel for every language. However, quick help is often 
essential for the customer. Hence, even raw MT output might be sufficient for the 
customer service personnel to understand the problem and send instructions or 
information to the customer that have been prepared in the respective language. 
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If the problem cannot be solved that easily or if the MT output is not helpful, 
the employee in customer service can still request a translation or a post-edited 
version of the text. 

Finally, an impacting factor might be how many qualified translators/post- 
editors are available for the job and/or how many translators/post-editors can or 
should be involved in the translation process. If there is little time to translate a 
long text, different translators often work on one text. This is especially common 
for text types that are very restricted and follow certain guidelines. However, the 
more people work on it, the more difficult it becomes to create a consistent text, 
even if all guidelines are satisfied. Hence, it might be plausible to have only one 
or very few post-editors to work on the text instead of numerous translators who 
translate from scratch. 

In summary, the following three questions are relevant when we assess whether 
the target text can be created with the help of MT and PE: 


« When do we need the translation? 
« Do we have enough qualified translators/ post-editors for the job? 


« How long will the translation be in use? 


8.4 Decision tree for PE 


As we have seen in the previous sections, many aspects of the source text and 
considerations concerning the skopos (Reiss & Vermeer 2014) of the final target 
text must be taken into account to decide whether MT and PE can be used. We 
combined all the above-mentioned criteria into a decision model with a tree like 
structure (Figure 8.1, Nitzke et al. 2019: 246). As already mentioned above, this 
decision model is described from the customer’s point of view and/or the per- 
son who decides if MT will be used and what degree of PE effort is necessary. 
Starting with the basic question whether MT can be used at all and ending with 
a recommendation on the use of MT and to what scope the PE task should be 
conducted. Of course, this can only be regarded as a tool and individual cases 
might be resolved using another solution than predicted here. 

Hör (2020) applied the decisions to the perspective of a language service pro- 
vider. She suggests that not all project managers should make decisions about PE 
projects, but only those who have PE experience. Another scenario could be that 
there is one (or more) designated PE expert(s). Further, she proposes to include a 
step for expectation management and pricing in the model - the latter referring 
to the aspect that the amount of money put into the project guides the extent of 
the PE effort. 
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8 Workflows for post-editing projects - which decisions have to be taken? 


Crossword puzzle — Chapter 8 


4 
1 
2 
5 
3 
7 
8 
6 


Across 1 What are the three stages of the translation process according to Hofmann 2012? 
translation ... , translation, and translation post-processing 3 When we assess whether a source 
text is appropriate for MT and PE, what are the three criteria we have to watch out for? Suitability 
of the source text for MT, risks, and ... of information 6 When a text is written in a controlled 
language, what is improved for the target audience? 7 What kind of mistakes in the source text 
might be corrected automatically? 8 What temporal aspect might cause very tight deadlines and 
might be a reason for PE, but should not still not outweigh quality considerations? ... time 

Down 2 What do we call the quality assurance step after a text has been fully post-edited or 
translated? 4 What kind of languages tend to be more difficult for MT because they are usually 
less linear? 5 The source text decreases the quality of the MT output when it is very complex, 


very inconsistent, or .... 
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9 Post-editing profiles — which 
competences are needed? 


Learning objectives 


You will learn... 


e which competences are important for MT and PE, 
e which job profiles might be interesting in the field of PE, 


e what is necessary for training. 


9.1 PE competences 


Many aspects have to be considered when dealing with MT and PE. Post-editing 
is a complex task. Accordingly, a qualified post-editor needs specific compe- 
tences to be able to fulfil all the requirements of such a task. The proposed PE 
competence model (Figure 9.1) is a further development of Nitzke et al. (2019) 
based on PACTE’s (2003) translation competence model and the revision com- 
petence model by Robert et al. (2017) since they share some of the competences 
needed for post-editing MT output. The differences and commonalities will be 
explained in the following. Further, not all decisions are necessarily made by 
the client — some clients might need a lot of guidance when it comes to MT. A 
post-editor, therefore, must be able to make informed decisions concerning risk 
assessment as well as the integration of MT and PE in the translation workflow. 

If we consider the competence model as a house of PE competences, the ar- 
chitecture of the house is grounded on the basic competences we also expect 
from professional translators: translation competences, including bilingual, ex- 
tralinguistic and research competence. This is also the basis for a skilled post- 
editor. Translation competences have been described in different models over 
the years, e.g., PACTE (2003), EMT (2009), Göpferich (2009). It is essential that 
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MT engineering 
consulting 


error handling 


translation competence including bilingual 
extralinguistic research competence 
Figure 9.1: PE competence model 


post-editors are skilled translators, because they need the same basic skill set. 
These skills include amongst others knowledge about text type conventions, the 
ability to deal with style guides or controlled languages, knowledge about con- 
trastive differences, cultural specificities, etc. As they automatically learn from 
data, the machines might be able to recreate some text type conventions or cul- 
tural differences, but, accordingly, the training data have to be very good and 
specific. Nonetheless, chances are high that the machine will make mistakes, es- 
pecially in these areas. Hence, a professional translator is needed to post-edit the 
machine-translated output, especially when the goal is a high quality target text. 

Similar to translation and revision competence, a post-editor has to have pro- 
ficient knowledge of the source and target language, as monolingual PE always 
bears the risk that content mistakes are not recognised (see §4.1.3). This basic 
competence is often referred to as bilingual competence in translation compe- 
tence models. 

Another similarity to specialised translation and revision tasks is that a post- 
editor also needs to have general world knowledge as well as the relevant do- 
main knowledge in order to properly understand the thematic subject of the 
source text. This knowledge can be summarised with the term extralinguistic 
competence. Knowledge about cultural domain differences helps the post-editor 
to interpret the meaning in the source text correctly and transfer it adequately 
to the target text. 

Finally, a post-editor also needs to know where and how to find information 
he or she does not have, i.e. research competence is required. Depending on the 
thematic field of the translation, specialised (online) dictionaries might be the 
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first choice, whereas, for others, parallel corpora or thesauri might be better op- 
tions. Efficient research strategies positively influence the workflow time of a PE 
task. Further, the post-editor needs to learn to what extent he or she can trust 
the MT output, e.g. concerning the correct translation of terminology, and when 
MT decisions have to be changed. 


The three pillars of the model define the additional competences: error han- 
dling, which includes error spotting, error classification, and error correction; 
MT engineering, which includes training and assessment of MT systems, and spe- 
cialised consulting competences, which are tailored to the needs of post-editing 
jobs. Let us have a closer look at these competences, which will be used to de- 
scribe different job profiles later in §9.2. 

Let us first discuss the error-spotting competence. Different MT systems (rule- 
based, statistical, neural, etc.) generate different errors. Hence, it is important to 
know what kind of system will be used and to have knowledge of the approach 
used in the MT system. In recent years, neural MT has become the state-of-the- 
art. However, it is still plausible that statistical or hybrid systems will be used - or 
that other MT approaches will be developed in the future. However, let us focus 
briefly on neural MT systems. Many errors generated by neural MT systems are 
more difficult to identify compared to statistical MT since the MT output is more 
fluent and seems to be correct, which leads to the problem of overlooking mis- 
takes that are not obvious (Toral et al. 2018, e.g., show that pauses become fewer 
but longer when post-editing neural MT output). Therefore, the post-editor has 
to be trained to spot exactly these more fine-grained mistakes and problems. Fur- 
ther, the post-editor has to work efficiently and has to know which errors have to 
be corrected to which extent according to the respective guidelines. This means 
that the trained post-editor should be able to identify what kind of mistake (s)he 
stumbled across and whether this mistake has to be improved or not according 
to the job-specific guidelines in order to avoid over-corrections or over-editing 
(see e.g. Nitzke & Gros 2020 and Vardaro et al. 2019). For example, different stud- 
ies have shown that post-editors are primed by the MT output (e.g. Bangalore 
et al. 2015) or at least feel primed by the MT output (e.g. Moorkens et al. 2018). 
Post-editors should be aware of these phenomena, should be able to recognise 
them, and should know how to deal with them. 

A post-editor needs some knowledge about machine translation engineering, 
although the depth of knowledge might vary across the different job profiles (see 
§9.2). As we have already mentioned, the different MT systems generate different 
problems and mistakes. Hence, a post-editor needs to know how an MT system 
works and which possible pitfalls it may generate. MT systems often generate 
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other problems than human translators produce (Carl et al. 2015, Nitzke 2019). 
Most of them are related to the architecture of the MT system. Knowing how 
MT is implemented helps to spot potential problems or difficulties. Ideally, in our 
view, a post-editor should be able to assess the quality of the MT training ma- 
terials and even to improve the training process if necessary. Some post-editors 
might even help to set up a new system as they can gather and evaluate the 
training data. 

Finally, a certain consulting competence is essential. Many clients might not be 
aware of the pros and cons of using machine translation systems for the trans- 
lation process. Even clients that often work with translation professionals and 
language service providers might not be aware of risks and strategic processes 
as PE has only recently become established on the market. Hence, a post-editor 
has to inform the customer or project manager about potential risks as well as 
problem-solving strategies, respectively, i.e. the risk assessment should enable 
the post-editor to give advice on these questions, even if the post-editor is not 
fully responsible for the decisions regarding the overall project. A post-editor 
should be able to voice his/her concern if a decision seems too risky or suggest 
the use an MT system if it seems plausible for the job to avoid regret in hindsight. 
This kind of support should also be included in price calculations. The consult- 
ing competence goes hand in hand with risk assessment and service competences 
that are part of what we call PE soft skills. The risk assessment competence is 
very important for judging the project and supporting the client, as already dis- 
cussed in §7 and §8. Depending on the position of the post-editor in the whole 
PE process, the post-editor might be more or less responsible for the decisions 
concerning the use of MT. However, every post-editor should have at least lit- 
tle knowledge about potential risks associated with using MT systems in the PE 
process in order to support the clients and raise doubts if certain decisions seem 
risky or implausible (Kenny et al. 2020). In the field of PE, service competence 
means that the post-editor should be able to calculate prices competently, con- 
sciously, and transparently considering the quality of the MT output and the 
necessary PE effort, even though measuring and estimating PE effort is challeng- 
ing (there has been a lot of research in this area, see e.g. Specia 2011; Moorkens et 
al. 2015; Schaeffer & Carl 2014). However, it should become easier to predict the 
PE effort with increasing experience. Further, as the market is still developing, 
professional associations will be able to give more information on this topic in 
the future, which will facilitate a career entry. Another aspect of this competence 
includes handling state-of-the-art CAT and revision tools as well as integrated 
MT systems. The post-editor must be able to post-edit the texts efficiently ac- 
cording to the client’s guidelines and must be able to adapt to the client’s quality 
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expectations. Best practice would be to immediately save the post-edited or fi- 
nally revised text in the TM or in the training database. 

In summary, the post-editor should know the translation market, including all 
aspects of MT and PE, and should be able to negotiate with the customer on an 
equal footing. The post-editor should be able to match the needs of the customer 
with the set-up and conditions of the PE task as well as with the resources avail- 
able to be able to make an appropriate offer that calculates a realistic time and 
cost frame for the job. 


In our house of PE competence, the three pillars are framed by a roof, which 
represents soft skills for post-editors. As described in a similar way in the mod- 
els by PACTE (2003) and Robert et al. (2017), a PE task is also influenced by its 
surrounding factors such as: 


e psycho-physiological components, 
e an affinity towards the latest technological developments, 
e the PE brief including guidelines for the PE task, 


e the post-editor’s self-perception. 


Some of the psycho-physiological components are especially important for 
post-editors, such as a well-developed ability to concentrate and sustain attention 
(especially in the case of repeated mistakes in the MT output), stress-resistance, 
logical reasoning, analytical thinking, and quick-wittedness. An affinity to work- 
ing with the latest technological developments is an essential requirement to 
work as a post-editor, because PE tasks always go hand in hand with MT and 
CAT tools. 

A PE job can only be accomplished successfully if the post-editor knows the 
target audience, the skopos of the target text, the effort that needs to go into the 
PE task (light vs. full PE, etc., see §4), which should all be summarised in a PE 
brief. Further, it has to be obvious which the post-editor’s responsibilities are (e.g. 
maintaining of the translation memory and terminology management systems, 
reporting or even correcting flaws of the MT system). 

The broader perspective of the role and responsibilities of the post-editor should 
be accompanied by a new self-perception and appropriate professional ethics. 
When the market began to change, translators perceived PE as a mediocre task 
for quite a while (Guerberof Arenas 2013). However, post-editors should perceive 
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themselves not only as mere proof-readers of MT output, but as competent lan- 
guage consultants and experts in creating PE processes to establish PE as a pro- 
fessional task in its own right. As such, they should take responsibility for the 
successful creation of the target text. It also requires new professional ethics that 
still need to be conceptualised. The new ethos should incorporate elements such 
as the willingness of post-editors to accept that sometimes the quality of the tar- 
get text does not have to be 100% to still fulfil the purpose of the target text, i.e. 
they deliver fit-for-purpose post-edited texts (Bowker 2020a). In addition, post- 
editors (as well as revision experts) should be able to resist the urge to correct 
text units that do not need corrections just to prove that they are competent pro- 
fessionals who are indispensable to the market (cf. also Mossop 2019 and Vardaro 
et al. 2019). 

The model we presented above shows many similarities to translation and re- 
vision models. As the basis of our PE model are general translation competences, 
we argue that translation and PE should not be trained separately, but PE should 
be added to modern translation curricula (Bernardini et al. 2020). We recommend 
integrating PE in late B.A. translation programmes or even only in M.A. studies 
so that a general translation competence has already been developed to a certain 


degree. 


9.2 Job profiles 


Our model from section 9.2 presents a general PE model. However, the compe- 
tences we presented there can play either a major or a minor role, depending on 
the specialisation a post-editor wants to follow. As a consequence, the pillars in 
the model can vary in importance depending on the specific competences needed 
for the respective job profile. Let us discuss three possible job perspectives for 
post-editors with the help of this model. But remember that these are only sug- 
gestions and individual specialisations can be combined to different degrees. 


9.2.1 PE competences for post-editors 


First, we will focus on the most obvious specialisation, namely practical PE. The 
post-editor is actively responsible for handling the MT output in respect to the 
source text. The main focus is therefore error handling (Figure 9.2), which thus 
represents the most important pillar in our competence house. Error handling 
combines different sub-tasks. Post-editors must be very sensitive to error spot- 
ting and error classification, meaning that they have to be able to decide whether 
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an error has to be corrected according to the guidelines, and of course the effi- 
cient correction of the errors. The post-editor is dependent on a well-organised 
PE process, including a PE brief, comprehensive PE guidelines, a well-functioning 
MT system, the integration into a translation memory environment, maybe a 
client-specific terminology database, etc. As not all clients will be familiar with 
the professional handling of PE tasks, it is helpful for the post-editor to have 
knowledge about the inner workings of MT systems for the post-editing process, 
e.g. for error spotting and characterisation. In addition, it is probably helpful if 
they are able to consult clients, especially if they work as freelancers, but those 
competences have a lower priority. The risk assessment and service competences 
are necessary for the same reasons, but play a minor role, because they are the 
executing part of the PE process. 
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Figure 9.2: PE competences for post-editors 


9.2.2 PE competences for MT engineers 


A second job perspective might be called MT engineering (Figure 9.3). MT engi- 
neers are more focused on the technology. They have in-depth knowledge about 
how to train and maintain MT engines as well as deep knowledge of MT struc- 
tures, and how to improve and evaluate the MT output. They are responsible for 
questions like what system is suitable for the job or for the company and what 
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kind of data are needed for training. Further, they should have a lot of knowledge 
about (other) CAT tools like translation memory and terminology management 
systems, so that they can implement the technological support that best fits the 
respective project. To obtain these goals they also need basic knowledge on er- 
ror handling and client consulting to be able to assess the post-editors needs and 
the project’s requirements. Their risk assessment and service competences are 
an integral part of their job and can be judged as intermediate compared to the 
other job profiles. 
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Figure 9.3: PE competences for MT engineers 


9.2.3 PE competences for PE consultants 


Finally, one job perspective for post-editors can be to work in a consulting posi- 
tion (Figure 9.4), where the main tasks are project and risk management, taking 
charge of the communication between the different stakeholders and making the 
decisions that are necessary to set up a PE project. For this, they naturally need 
basic knowledge on PE practice and MT engineering and they also must have 
profound knowledge about risk assessment and service, because they create the 
projects and decide which texts and projects can be post-edited and which not. 
They have to make decisions on the aspects we discussed in §8. Concerning the 
actors described in the ISO standard in section §4, PE consultants can also take 
the role of or work closely together with the translation service provider. 
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Figure 9.4: PE competences for PE consultants 


9.3 PE training and education 


Training and education in PE should be in line with the different job profiles 
we described in 9.2. As a prospective post-editor, you should specialise on the 
aspects of the job profile you are most interested in. 

If you do not have any translation experience and knowledge about transla- 
tion studies at all, you should consider a translation university degree or a long 
term translation training to gain the relevant translation competences. When 
you choose a university/degree/programme, you might already keep in mind 
which job profile seems the most interesting for you and look at the curricula 
to see whether an institution offers PE courses, courses on MT or computational 
linguistics, and/or courses on project management. 

If you are a professional translator, you have already gained a lot of PE knowl- 
edge in this book. You may want to look for courses on advanced training of- 
fered by universities or professional associations. The PE job profile is probably 
the closest to aim for if you want to build on your existing knowledge. Try to 
find a programme or some add-on courses that discuss practical aspects and offer 
some hands-on exercises. If you want to focus on MT engineering, you should try 
to gain some extra knowledge on the technology and might want to find some 
computer linguistic, MT, or artificial intelligence courses. If you want to focus 
on consulting, it would be advisable to gain some knowledge about project and 
risk management. Some PE practice would be helpful for the MT engineering 


and consulting profiles as well. 
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Crossword puzzle — chapter 9 


6 


Across 2 What competence is needed for PE and translation from scratch that describes the 
knowledge about domain-specific contents, e.g. medical knowledge for translating and post-editing 
medical texts? 3 What belongs to the error handling competence? Error spotting, error classifi- 
cation, and error ... 5 What do we call the competence that includes training and assessment 
of MT systems? MT ... 7 What is the most prominent competence pillar for the job profile of a 
post-editor? 

Down 1 Which is one of the basic competences for post-editing? 4 Which competence de- 


scribes that post-editors must be able to inform clients about the pros and cons of using machine 
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translation systems? 6 The ability to concentrate and sustain attention, stress-resistance, logical 
reasoning, analytical thinking, quick-wittedness, an affinity for technologies are the ... components 


that are especially important for post-editors. 
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10 Food for thought and wrap-up 


PE has changed the field of professional translation and the market has become 
more diverse with more possibilities to transfer a source text into a target text. 
Hence, professional translators have to adapt to these changes and have to de- 
cide whether they want to broaden their range of services and offer PE. If so, 
many practical aspects have to be considered - many of which we could not ad- 
dress in this short textbook, amongst others because these aspects are often very 
individual. 

One example is price calculation. As in translation, there are possibly many 
different ways to calculate prices, e.g. per source/target text character/word/line, 
per hour, according to the editing distance, calculation of MT segments equally 
to fuzzy match segments, etc.! Reasons for choosing one or the other are simi- 
larly manifold. Hence, it might also be reasonable to decide on each project indi- 
vidually depending on the given constraints and characteristics. In the end, both 
translators and clients should profit from the PE process. Translators should save 
time (and hence gain money, not lose money) and clients should save money, as 
well. You can also find some more information in the TAUS pricing guidelines”. 

As we already briefly discussed in §2, PE has also brought interesting new 
aspects into the field of research, which we cannot discuss in more detail here. 
However, we would like to mention that both practice and research are in close 
dialogue with each other. Guerberof Arenas (2014), for example, presents a study 
on the productivity and quality of PE in a translation memory tool compared to 
fuzzy and no matches. After presenting the results, she also discusses how those 
variables influence the pricing and that the potential benefits of PE jobs vary for 
each translator individually. 


“L...] and it might be difficult to find a satisfactory solution to determine 
a “fair” price. In most cases, the translators should really analyze if the 


‘Tf you want to learn more about how to generate time or editing distance reports in memoQ, 
read this article https://blog.memoq.com/time-tracking-and-editing-distance-reporting, last 
accessed on 21 April 2021. The article is well written and might give you interesting insights 
even if you do not use memoQ. 

“last accessed 11 June 2021 
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compensation scheme applied for a particular project is beneficial for them 
according to the productivity they experience during this job or series of 
similar jobs. Moreover, they should also consider if the use of MT and TM 
segments might benefit the quality they deliver, as we have seen in this 
study.” Guerberof Arenas (2014: 183) 


Also, different publications on PE by professional associations, e.g. Ottmann 
(2017) or Porsiel (2017), combine contributions of researchers and professionals. 

Another aspect we only partly - or indirectly - focused on is machine trans- 
lation ethics. Ethics is a large and important topic in artificial intelligence (Liao 
2020) as with advancing AI, more and more decisions have to be made. One of 
the most famous ethical questions in AI is concerning self-driving cars. Although 
it is assumed that fewer accidents will happen when the human error is elimi- 
nated in traffic, the question remains whom to harm in an emergency situation 
with different actors involved (Bonnefon et al. 2016). The ethical dilemmas do 
not seem to be that extreme for MT systems. Nonetheless, it seems appropriate 
to discuss this aspect. 

So far, there has been little research on MT ethics. The main focus of the exist- 
ing studies has been mainly on translating literature (e.g. Taivalkoski-Shilov 2019 
or Kenny & Winters 2020), which is of course not the only area that should be 
concerned with ethics. Some important issues are raised in Moorkens & Rocchi 
2020. They discuss, for example, that the ownership of data and translations is a 
matter of ethical considerations as large amounts of high-quality human transla- 
tions are needed to train MT systems. The use of the data is often not transpar- 
ent, and it is often not clear who was asked for permission to use the data for MT 
training. Further, MT ethics should also be concerned with reporting the use of 
MT and PE giving the reader the knowledge which MT system, which PE style, 
and which post-editor were involved in creating the texts. This should also be 
true for domain-specific texts, where usually not even the translators are listed. 
Although most readers might not be interested in the nature of the translation, 
it might force the clients to fairer processes as a long list of different MT systems 
and post-editors, who only light post-edited the MT output might reflect badly 
on the client/company. 

Additionally, when we talk about highlighting MT use, we might want to talk 
about the use of MT on websites, where the MT output is not post-edited. Let’s 
take as an example the website of Tripadvisor’, where, among others, users can 
rate hotels, restaurants, etc. The website implements Google Translate and ac- 
cording to the locale settings, a translation of the users’ comments is presented 


>https://www.tripadvisor.de/, last accessed 10 February 2021 
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automatically. Under the comments there is a note in an unremarkable, gray 
font that the translation was created by Google and that the reader has the pos- 
sibility to rate the translation. It is quite likely that many users do not see this 
information when they read the comment. From our perspective (taking aside 
all the benefits an MT systems provides on such websites), it should be open to 
discussion whether this report of MT use is sufficient and whether it is ethical 
to present it automatically. 

We wanted to finish the book with this little bit of food for thought. Post- 
editing is still a new area and thanks to the ongoing technological developments 
and innovations in artificial intelligence, many changes and new challenges are 
still to come. Now that we are at the end of the book, think about what has 
changed in your perception? How do you feel about MT and PE now? You might 
want to go back to §1 and §2 and look at the answers you gave at the very begin- 
ning of this short introduction to PE. Do you still agree with your assessment or 
has something changed? 


83 


10 Food for thought and wrap-up 


Solutions to crossword puzzles 
Section 2 


1. TRANSLATOR 
2. EFFORT 

3. CATTOOLS 

4, PREEDITING 
5. RELEVANCE 
6. EMPIRICAL 

7. EYETRACKING 


8. CRITT 
Section 3 


1. WEAVER 

2. GEORGETOWN 

3. RUSSIAN 

4, ALPAC 

5. SYSTRAN 

6. WEATHERFORECASTS 
7. BABELFISH 

8. STATISTICAL 

9. INTERLINGUA 


10. NEURAL 
Section 4 


1. ENOUGH 
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5. 


6. 


. SYNTAX 
. OUTPUT 


. TERMINOLOGY 


MONOLINGUAL 


FULL 


Section 5 


1. 


2. 


3. 


4. 


CONTROLLED 


CREATIVITY 


RESTRICTIVE 


SUBTITLES 


Section 6 


1. 


2. 


3: 


4. 


5: 


6. 


SEGMENT 


MANAGEMENT 


TERMINOLOGY 


FUZZYMATCH 


EMPTY 


INTERACTIVE 


7. ADAPTIVE 


Section 7 


1. 


2. 


3. 


4. 


OPERATIVE 


STRATEGIC 


CONFIDENTIAL 


LIABILITY 


85 


10 Food for thought and wrap-up 


5. RISKS 
Section 8 


1. PREPARATION 
2. REVISION 

3. SENSITIVITY 
4. DISTANT 

5. DEFECTIVE 

6. READABILITY 
7. SPELLING 


8. TURNAROUND 
Section 9 


1. TRANSLATION 

2. EXTRALINGUISTIC 

3. CORRECTION 

4. CONSULTING 

5. ENGINEERING 

6. PSYCHOPHYSIOLOGICAL 


7. ERRORHANDLING 
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A short guide to post-editing 


Artificial intelligence is changing and will continue to change the world we live in. These 
changes are also influencing the translation market. Machine translation (MT) systems 
automatically transfer one language to another within seconds. However, MT systems 
are very often still not capable of producing perfect translations. To achieve high quality 
translations, the MT output first has to be corrected by a professional translator. This pro- 
cedure is called post-editing (PE). PE has become an established task on the professional 
translation market. The aim of this text book is to provide basic knowledge about the 
most relevant topics in professional PE. The text book comprises ten chapters on both 
theoretical and practical aspects including topics like MT approaches and development, 
guidelines, integration into CAT tools, risks in PE, data security, practical decisions in 
the PE process, competences for PE, and new job profiles. 
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